[torqueusers] getting torque/ pbs to reboot a node periodically.
Bogdan Costescu
Bogdan.Costescu at iwr.uni-heidelberg.de
Tue Dec 9 13:20:03 MST 2008
> First, you need to drain the nodes by marking them offline. Then
> you need to mark them for reboot using the node note. Then a script
> can reboot nodes when it finds them offline, without a job, and
> marked for reboot.
I've recently done something similar (reboot node after whatever jobs
run on it finish) using pbs_python in only a few lines of (Python)
code. There is no extra script looking for the node note, the Python
script polls the state of the node until it's only "offline", proceeds
to do whatever it needs to reboot the node and as soon as the node
goes into state "down" it clears the "offline" state.
--
Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the torqueusers
mailing list