[torqueusers] recovery behavior question

Garrick Staples garrick at usc.edu
Thu Feb 14 12:53:33 MST 2008


On Thu, Feb 14, 2008 at 06:19:01PM +0000, Martin Bly alleged:
> On Thu, 14 Feb 2008, John Wang wrote:
> 
> > Hello Tim
> > 
> > So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
> > from running on them?
> > 
> > That had been the practice here as well.   It just seems to me that we
> > shouldn't have to use such work arounds.
> 
> qmgr -c "s n nodename state=offline" 
> 
> on the sever works for me.  Jobs pick up on the node again when I do
> 
> qmgr -c "s n nodename state=free" 

Playing with the state bits directly in qmgr is a bad idea.  Some of those bits
(like free) are best left to torque to handle itself.

You can *add* and *subtract* offline.
  s n name state+=offline
  s n node state-=offline

But that is exactly what pbsnodes -o/-c does with an easier syntax.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080214/54215d75/attachment.bin


More information about the torqueusers mailing list