[torqueusers] node bad state

Garrick Staples garrick at usc.edu
Wed Nov 30 17:39:44 MST 2005


On Wed, Nov 30, 2005 at 06:02:42AM -0500, Ghislain ESCORNE alleged:
> Garrick Staples wrote:
> 
> >On Wed, Nov 30, 2005 at 05:18:46AM -0500, Ghislain ESCORNE alleged:
> > 
> >
> >>Hello,
> >>I have a problem when I try to submit many jobs which need to run on 
> >>more than one node.
> >>   
> >>
> >
> >What exactly is the problem?  These emails have had a wealth of
> >information, but I'm having troubling grasping the actual observed
> >problem.
> > 
> >
> When I submit a script with
> #PBS -l nodes=1:ppn=2 the script  runs correctly
> but when I submit
> #PBS -l nodes=2:ppn=2  the job  stays in queue  (Job bounces from status 
> R to status Q)
> the logs of pbs_server show :
> 
> compute-0-1.local with bad state (state: QUEUED)
> code=15016(Request invalid for state of job)
> 
> Thanks for your help

There are several reasons why a job will fail to start.  Do you see any
errors in the MOM logs?  Be sure to increase the loglevel on MOM if you
don't see anything.  Also be sure TORQUE is configured with
--enable-syslog and look in /var/log/messages (or wherever your syslog
writes).

And verify the following on all machines:
  - DNS resolution works correctly with matching forward and reverse
  - the time is synced correctly
  - user accounts exist
  - user home directories can be mounted
  - prologue scripts exit with 0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051130/2ba389d8/attachment.bin


More information about the torqueusers mailing list