[torqueusers] Re: getting torque/ pbs to reboot a node periodically
rpnabar at gmail.com
Tue Dec 9 12:39:37 MST 2008
>It is actually difficult to do while avoiding possible race conditions.
>First, you need to drain the nodes by marking them offline. Then you need to
>mark them for reboot using the node note. Then a script can reboot nodes when
>it finds them offline, without a job, and marked for reboot.
Thanks Garrick! How about rebooting at least those nodes that find
themselves without a job.
Is there a provision so that I can tell pbs to exec a script when it
finds itself job-free (might work better on my older nodes with only 2
cores / node)
I could have this (shell) script to then check when was the last time
it was rebooted and if too long ago then reboot. What do you think of
Idea 2: I'd have to submit dummy jobs with a cron from the master node
that are designed to run on specific nodes. But then again torque will
not allow a job to execute a reboot command will it? Maybe if
submitted as a root user?
More information about the torqueusers