[torqueusers] Re: getting torque/ pbs to reboot a node periodically.

Gabe Turner gabe at msi.umn.edu
Wed Dec 10 15:40:47 MST 2008


On Wed, Dec 10, 2008 at 04:36:45PM -0600, Rahul Nabar wrote:
> >Why not submit, as a job, "/sbin/reboot"?  Or if permissions would be
> >an issue, something suid.  You'd request all resources on the node,
> >and a job time of ten minutes.  The point being to occupy a node
> >legitimately, and when your time comes as regulated by torque, reboot
> >the node.  The job would probably fail, but when the node comes back
> >online it should rejoin the queue and be available again right?
> 
> Thanks guys. I need to figure out which option I should use. Too many
> alternatives! :)
> 
> OTOH, I never thought we'd need so many hacks to do something like a
> planned reboot. I had expected to find a torque / maui inbuilt option.
> Is it so uncommon to request reboot on a compute-node?

I just don't know that it's all that common in practice.  One of the issue
we've had when even attempting such is that there is no guarantee that the
node will come up in a sane state every single boot.  Even if it works
great 99 out of 100 boots, something going awry could potentially drain
your queue.  I've found user kind of hate that ;)

Gabe

-- 
Gabe Turner                                             gabe at msi.umn.edu
UNIX System Administrator,
University of Minnesota
Supercomputing Institute                          http://www.msi.umn.edu


More information about the torqueusers mailing list