[torqueusers] Re: getting torque/ pbs to reboot a node periodically.

Rahul Nabar rpnabar at gmail.com
Wed Dec 10 15:36:45 MST 2008


>Why not submit, as a job, "/sbin/reboot"?  Or if permissions would be
>an issue, something suid.  You'd request all resources on the node,
>and a job time of ten minutes.  The point being to occupy a node
>legitimately, and when your time comes as regulated by torque, reboot
>the node.  The job would probably fail, but when the node comes back
>online it should rejoin the queue and be available again right?

Thanks guys. I need to figure out which option I should use. Too many
alternatives! :)

OTOH, I never thought we'd need so many hacks to do something like a
planned reboot. I had expected to find a torque / maui inbuilt option.
Is it so uncommon to request reboot on a compute-node?

>Agreed with everything above.  Fix the problems.  Don't reboot unnecessarily.

That would be the neater approach, agreed. I take a pragmatic
approach. 5 minutes lost to reboot every 2 weeks is way cheaper than a
week of digging into badly documented user code to figure out a
zombie, memory leak etc.

-- 
Rahul


More information about the torqueusers mailing list