[torqueusers] node bad state
Dave Jackson
jacksond at clusterresources.com
Wed Nov 30 18:22:30 MST 2005
Garrick,
Your comments have been added to the TORQUE FAQ at
http://clusterresources.com/torquedocs/10.1troubleshooting.shtml#faq
Let me know if there is more you would like to add.
Thanks for the contribution!
Dave
PS. I know, we need the WIKI! Its coming...
On Wed, 2005-11-30 at 16:39 -0800, Garrick Staples wrote:
> On Wed, Nov 30, 2005 at 06:02:42AM -0500, Ghislain ESCORNE alleged:
> > Garrick Staples wrote:
> >
> > >On Wed, Nov 30, 2005 at 05:18:46AM -0500, Ghislain ESCORNE alleged:
> > >
> > >
> > >>Hello,
> > >>I have a problem when I try to submit many jobs which need to run on
> > >>more than one node.
> > >>
> > >>
> > >
> > >What exactly is the problem? These emails have had a wealth of
> > >information, but I'm having troubling grasping the actual observed
> > >problem.
> > >
> > >
> > When I submit a script with
> > #PBS -l nodes=1:ppn=2 the script runs correctly
> > but when I submit
> > #PBS -l nodes=2:ppn=2 the job stays in queue (Job bounces from status
> > R to status Q)
> > the logs of pbs_server show :
> >
> > compute-0-1.local with bad state (state: QUEUED)
> > code=15016(Request invalid for state of job)
> >
> > Thanks for your help
>
> There are several reasons why a job will fail to start. Do you see any
> errors in the MOM logs? Be sure to increase the loglevel on MOM if you
> don't see anything. Also be sure TORQUE is configured with
> --enable-syslog and look in /var/log/messages (or wherever your syslog
> writes).
>
> And verify the following on all machines:
> - DNS resolution works correctly with matching forward and reverse
> - the time is synced correctly
> - user accounts exist
> - user home directories can be mounted
> - prologue scripts exit with 0
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list