[torqueusers] taking node offline w/o killing running job

Garrick Staples garrick at usc.edu
Mon Jan 9 12:27:28 MST 2006


On Mon, Jan 09, 2006 at 02:02:59PM -0500, Caird, Andrew J alleged:
> > Use 'pbsnodes -c nodename', all it does is clears the offline bit.
> 
> This sets the node state to free:
> 
> qmgr -c 'p n mor153'
> set node mor153 state = free
> set node mor153 properties = myrinet
> set node mor153 ntype = cluster
> set node mor153 status = opsys=linux
> set node mor153 status += uname=Linux mor153 2.6.9-22.0.1.ELsmp #1 SMP
> Tue Oct 18 18:39:27 EDT 2005 i686
> set node mor153 status += sessions=20069
> set node mor153 status += nsessions=1
> set node mor153 status += nusers=1
> set node mor153 status += idletime=2081510
> set node mor153 status += totmem=1554284kb
> set node mor153 status += availmem=1442688kb
> set node mor153 status += physmem=1554284kb
> set node mor153 status += ncpus=2
> set node mor153 status += loadave=2.00
> set node mor153 status += netload=918193721
> set node mor153 status += state=free
> set node mor153 status += jobs=6068.morpheus.engin.umich.edu
> set node mor153 status += rectime=1136833218
> 
> where is is really in use:
> 
> # qstat -an1 | grep mor153
> 
> 6068.mor  xxxxxxx  yyyyyy zzzzzz --   1  --  --  120:0 R 71:01
> mor153/1+mor153/0
> 
> Before setting it offline ("qmgr -c 's n mor153 state=offline'") and
> then running "pbsnodes -c mor153" it was "state = busy", which I would
> prefer.

"free" is the absence of all state bits.  "in use" != "busy".

Just use pbsnodes -o/-c.  Don't use qmgr and you won't overwrite state
bits.

When you used qmgr to manually set the node's state to offline, you
overwrote all state bits, wiping out the busy.  Then 'pbsnodes -c'
removed the offline bit, leaving it with 0 state bits.

Is the node reporting itself as busy?  Look for the state inside of the
status in 'pbsnodes -a nodename'.  If so, then just wait a minute and
server will find it again soon.

Really, just use pbsnodes for this:
  pbsnodes -o mor153
  pbsnodes -c mor153

If you must use qmgr, use the INCR/DEC operators (this is what 'pbsnodes
-o/-c' does internally):
  s n mor153 state+=offline
  s n mor153 state-=offline

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060109/4d1eb8f3/attachment.bin


More information about the torqueusers mailing list