[torqueusers] node(s) not accepting jobs
rishi pathak
mailmaverick666 at gmail.com
Wed Apr 29 13:21:04 MDT 2009
What is the cpu load on those nodes. Any node health check scripts running.
What is their output.
On Wed, Apr 29, 2009 at 12:58 AM, Tony Schreiner <schreian at bc.edu> wrote:
>
> On Apr 28, 2009, at 3:17 PM, Tony Schreiner wrote:
>
> > On a cluster of 62 nodes, with torque 2.1.10 and maui 3.2.6p19
> >
> > overnight 2 nodes have stopped accepting jobs
> >
> > partial pestat output
> >
> > node40 free 0.00 7879 4 16069 231 0/0 0
> > node41 free 0.00 8067 4 16257 228 0/0 0
> > node42 free 0.00* 56481 8 58465 269 0/0 88
> > node43 excl 8.22 64561 8 66545 22975 1/1 8 156354
> > mikaels
> > node44 free 0.11* 64561 8 66545 267 0/0 64
> > node45 excl 8.07 64561 8 66545 21408 1/1 8 156060
> > NONE* 156227
> >
> > there are jobs in the queue and get submitted to other nodes but not
> > to node42 and node44.
> > node40 and node41 are not eligible for the queue being run so it's ok
> > that they have no jobs.
> >
> > Please note the last column on those 2 nodes which is the "tasks"
> > parameter and is non-zero
> >
> > I have restarted pbs_mom on the nodes, also done momctl -C and momctl
> > -c all on those nodes.
> > There is nothing in the mom_priv directory associated with any job.
> >
>
>
> If I may add one more thing.
> An attempt to force a job to run on the node with qrun -H node42 JOBID
>
> gives the following error
> qrun: Resource temporarily unavailable REJHOST=node42 MSG=cannot
> allocate node 'node42' to job - node not currently available (nps
> needed/free: 1/0, joblist: l.bc.edu 2.6.27.21-170.2.56.fc10.x86_64
> #1 ....
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
Regards--
Rishi Pathak
Pune-Maharastra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090430/2aca1a5d/attachment.html
More information about the torqueusers
mailing list