[torqueusers] recovery behavior question
Martin Bly
bly at gridpp.rl.ac.uk
Thu Feb 14 11:19:01 MST 2008
On Thu, 14 Feb 2008, John Wang wrote:
> Hello Tim
>
> So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
> from running on them?
>
> That had been the practice here as well. It just seems to me that we
> shouldn't have to use such work arounds.
qmgr -c "s n nodename state=offline"
on the sever works for me. Jobs pick up on the node again when I do
qmgr -c "s n nodename state=free"
Martin.
>
> Regards,
> John
>
>
> On 2/13/08 8:48 PM, "Tim Freeman" <tfreeman at mcs.anl.gov> wrote:
>
> > If I submit a job with all moms in the pool in the 'down' state, the job sits
> > in the queue as expected. Then I bring up a node but the job in the queue is
> > not run until I submit another job (and they both do run).
> >
> > Is this expected? Is there a setting I am missing to get around this?
> >
> > (Torque 2.2.1, Maui 3.2.6p19 but I saw this with pbs_sched too)
> >
> > Thankyou,
> > Tim
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
---------------------------------------------------------------------------
Martin Bly | Tier 1/A Systems Admin | Rutherford Appleton Laboratory
Email: bly at gridpp.rl.ac.uk Tel: +44|0 1235 446981 Fax: +44|0 1235 446626
---------------------------------------------------------------------------
More information about the torqueusers
mailing list