[torqueusers] recovery behavior question

Garrick Staples garrick at usc.edu
Thu Feb 14 12:51:37 MST 2008


On Thu, Feb 14, 2008 at 12:12:46PM -0600, John Wang alleged:
> Hello Tim
> 
> So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
> from running on them?

That's not what he said at all.

 
> That had been the practice here as well.   It just seems to me that we
> shouldn't have to use such work arounds.

Something is weird with your install, probably old binaries hanging around.
Please don't project this on to everyone else's systems.

 

> On 2/13/08 8:48 PM, "Tim Freeman" <tfreeman at mcs.anl.gov> wrote:
> 
> > If I submit a job with all moms in the pool in the 'down' state, the job sits
> > in the queue as expected.  Then I bring up a node but the job in the queue is
> > not run until I submit another job (and they both do run).
> > 
> > Is this expected?  Is there a setting I am missing to get around this?
> > 
> > (Torque 2.2.1, Maui 3.2.6p19 but I saw this with pbs_sched too)
> > 
> > Thankyou,
> > Tim
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080214/e3a56651/attachment.bin


More information about the torqueusers mailing list