[torqueusers] getting torque/ pbs to reboot a node periodically.
Garrick Staples
garrick at usc.edu
Tue Dec 9 13:31:30 MST 2008
On Tue, Dec 09, 2008 at 09:20:03PM +0100, Bogdan Costescu alleged:
>
> >First, you need to drain the nodes by marking them offline. Then
> >you need to mark them for reboot using the node note. Then a script
> >can reboot nodes when it finds them offline, without a job, and
> >marked for reboot.
>
> I've recently done something similar (reboot node after whatever jobs
> run on it finish) using pbs_python in only a few lines of (Python)
> code. There is no extra script looking for the node note, the Python
> script polls the state of the node until it's only "offline", proceeds
> to do whatever it needs to reboot the node and as soon as the node
> goes into state "down" it clears the "offline" state.
Without marking the node for reboot in some fashion, how do you know which
nodes to reboot? Perhaps a node was marked offline for some other reason?
And your script doesn't check to see if it has a running job?
--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
See the Dishonor Roll at http://www.californiansagainsthate.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081209/4bb20f94/attachment-0001.bin
More information about the torqueusers
mailing list