[torqueusers] Special job for reboot

Arnau Bria arnaubria at pic.es
Thu Jan 28 07:05:27 MST 2010


Hi all,

this issue is a little OT, but I'd like to know other admin experiences.

Someone already asked this some time ago:

http://www.supercluster.org/pipermail/torqueusers/2008-December/008373.html

But I don't find the solution he implemented and if it worked or not.

I've seen a couple of good ideas like the one from Brock Palen
recommending a job that requests a complet node and special host (#PBS
-l host=$host,naccesspolicy=SINGLEJOB) and the other from Garrick :

"First, you need to drain the nodes by marking them offline.  Then you
need to mark them for reboot using the node note.  Then a script can
reboot nodes when it finds them offline, without a job, and marked for
reboot."

But is someone really doing reboot via torque? What are your steps when
you need to reboot your farm?

Any experience will be welcome!

Cheers,
Arnau


More information about the torqueusers mailing list