[torqueusers] recovery behavior question

Martin Bly bly at gridpp.rl.ac.uk
Thu Feb 14 11:19:01 MST 2008


On Thu, 14 Feb 2008, John Wang wrote:

> Hello Tim
> 
> So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
> from running on them?
> 
> That had been the practice here as well.   It just seems to me that we
> shouldn't have to use such work arounds.

qmgr -c "s n nodename state=offline" 

on the sever works for me.  Jobs pick up on the node again when I do

qmgr -c "s n nodename state=free" 

Martin.


 
> 
> Regards,
> John
> 
> 
> On 2/13/08 8:48 PM, "Tim Freeman" <tfreeman at mcs.anl.gov> wrote:
> 
> > If I submit a job with all moms in the pool in the 'down' state, the job sits
> > in the queue as expected.  Then I bring up a node but the job in the queue is
> > not run until I submit another job (and they both do run).
> > 
> > Is this expected?  Is there a setting I am missing to get around this?
> > 
> > (Torque 2.2.1, Maui 3.2.6p19 but I saw this with pbs_sched too)
> > 
> > Thankyou,
> > Tim
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-- 
   ---------------------------------------------------------------------------
       Martin Bly | Tier 1/A Systems Admin | Rutherford Appleton Laboratory 
    Email: bly at gridpp.rl.ac.uk  Tel: +44|0 1235 446981 Fax: +44|0 1235 446626 
   ---------------------------------------------------------------------------



More information about the torqueusers mailing list