[torqueusers] getting torque/ pbs to reboot a node periodically.

Garrick Staples garrick at usc.edu
Tue Dec 9 13:31:30 MST 2008


On Tue, Dec 09, 2008 at 09:20:03PM +0100, Bogdan Costescu alleged:
> 
> >First, you need to drain the nodes by marking them offline.  Then 
> >you need to mark them for reboot using the node note.  Then a script 
> >can reboot nodes when it finds them offline, without a job, and 
> >marked for reboot.
> 
> I've recently done something similar (reboot node after whatever jobs 
> run on it finish) using pbs_python in only a few lines of (Python) 
> code. There is no extra script looking for the node note, the Python 
> script polls the state of the node until it's only "offline", proceeds 
> to do whatever it needs to reboot the node and as soon as the node 
> goes into state "down" it clears the "offline" state.

Without marking the node for reboot in some fashion, how do you know which
nodes to reboot?  Perhaps a node was marked offline for some other reason?

And your script doesn't check to see if it has a running job?

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

See the Dishonor Roll at http://www.californiansagainsthate.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081209/4bb20f94/attachment-0001.bin


More information about the torqueusers mailing list