[torqueusers] Re: getting torque/ pbs to reboot a node periodically.
Rahul Nabar
rpnabar at gmail.com
Wed Dec 10 15:36:45 MST 2008
>Why not submit, as a job, "/sbin/reboot"? Or if permissions would be
>an issue, something suid. You'd request all resources on the node,
>and a job time of ten minutes. The point being to occupy a node
>legitimately, and when your time comes as regulated by torque, reboot
>the node. The job would probably fail, but when the node comes back
>online it should rejoin the queue and be available again right?
Thanks guys. I need to figure out which option I should use. Too many
alternatives! :)
OTOH, I never thought we'd need so many hacks to do something like a
planned reboot. I had expected to find a torque / maui inbuilt option.
Is it so uncommon to request reboot on a compute-node?
>Agreed with everything above. Fix the problems. Don't reboot unnecessarily.
That would be the neater approach, agreed. I take a pragmatic
approach. 5 minutes lost to reboot every 2 weeks is way cheaper than a
week of digging into badly documented user code to figure out a
zombie, memory leak etc.
--
Rahul
More information about the torqueusers
mailing list