[torqueusers] getting torque/ pbs to reboot a node periodically.
Bogdan.Costescu at iwr.uni-heidelberg.de
Tue Dec 9 13:42:55 MST 2008
>> There is no extra script looking for the node note, the Python
>> script polls the state of the node until it's only "offline",
>> proceeds to do whatever it needs to reboot the node and as soon as
>> the node goes into state "down" it clears the "offline" state.
> Without marking the node for reboot in some fashion, how do you know
> which nodes to reboot?
The script knows which nodes it needs to reboot; it ignores other
nodes which are in "offline" state. If a node is marked "offline"
manually but the script is still asked to reboot it, what difference
could it make that the "offline" state was aquired from an admin or
from the script itself as long as the final result is the same:
draining of the node ?
> And your script doesn't check to see if it has a running job?
You missed the 'polls the state of the node until it's only "offline"'
or maybe I missed making it more verbose and saying 'and doesn't
contain other states related to running jobs, like "job-exclusive"'.
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the torqueusers