[torqueusers] recovery behavior question
garrick at usc.edu
Thu Feb 14 12:53:33 MST 2008
On Thu, Feb 14, 2008 at 06:19:01PM +0000, Martin Bly alleged:
> On Thu, 14 Feb 2008, John Wang wrote:
> > Hello Tim
> > So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
> > from running on them?
> > That had been the practice here as well. It just seems to me that we
> > shouldn't have to use such work arounds.
> qmgr -c "s n nodename state=offline"
> on the sever works for me. Jobs pick up on the node again when I do
> qmgr -c "s n nodename state=free"
Playing with the state bits directly in qmgr is a bad idea. Some of those bits
(like free) are best left to torque to handle itself.
You can *add* and *subtract* offline.
s n name state+=offline
s n node state-=offline
But that is exactly what pbsnodes -o/-c does with an easier syntax.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080214/54215d75/attachment.bin
More information about the torqueusers