[torqueusers] Re: getting torque/ pbs to reboot a node periodically

Rahul Nabar rpnabar at gmail.com
Tue Dec 9 12:39:37 MST 2008


>It is actually difficult to do while avoiding possible race conditions.

>First, you need to drain the nodes by marking them offline.  Then you need to
>mark them for reboot using the node note.  Then a script can reboot nodes when
>it finds them offline, without a job, and marked for reboot.

Thanks Garrick! How about rebooting at least those nodes that find
themselves without a job.

Is there a provision so that I can tell pbs to exec a script when it
finds itself job-free (might work better on my older nodes with only 2
cores / node)
I could have this (shell) script to then check when was the last time
it was rebooted and if too long ago then reboot. What do you think of
this idea.

Idea 2: I'd have to submit dummy jobs with a cron from the master node
that are designed to run on specific nodes. But then again torque will
not allow a job to execute a reboot command will it? Maybe if
submitted as a root user?

Any thoughts?

-- 
Rahul


More information about the torqueusers mailing list