[torqueusers] Re: getting torque/ pbs to reboot a node periodically

Garrick Staples garrick at usc.edu
Tue Dec 9 13:56:09 MST 2008

On Tue, Dec 09, 2008 at 09:43:22PM +0100, Justin Finnerty alleged:
> I would also suggest attacking this from the other direction.
> * You said that you wanted to clean up scratch/temporary files.  We had
> the problem of users accidently leaving data on a node's scratch space. 
> Eventually we made the scratch directory writable only by root.  pbs_mom
> (effectively) creates the TMPDIR as root then sets the user ownership
> which gives users access only to per-node disk space which is always
> cleaned up when the job ends (This assumes that your scratch space is not
> /tmp!) As we also clean the scratch directory on a reboot this has
> completely elminated all our problems with cleaning per-node scratch
> space.
> * The only memory leaks that can affect a node after a job ends are lost
> shared-memory segments.  This topic has been covered before and some
> suggestions for clean-up scripts have appeared on this list.
> * Why worry about zombies?  Unless you have thousands of them, in which
> case I would be jumping on the users to fix their code.  I may be wrong,
> but I think they are just dead entries in the process table and the linux
> kernel ignores them for scheduling so they should have zero impact on the
> node.

Agreed with everything above.  Fix the problems.  Don't reboot unnecessarily.

> Rebooting the node via a queue has obvious problems.  What do people feel
> about the following.
> * Have a queue administrator create a cron job to submit a job that
> requires all the resources of a node (or the node exclusive job property).
>  All this job does is write a special file into /tmp (eg
> /tmp/go.for.reboot) and quits.
> * Set up your pbs_mom healthcheck script to check for this file and set
> the node 'down' when present.  (Shouldn't this stop a new job starting on
> the node?)
> * Have a cron job on the node that reboots the node when the
> /tmp/go.for.reboot file is present.  (Perhaps you should check the file's
> ownership to verify that some other user is not messing about.)
> * Remove the /tmp/go.for.reboot in your boot scripts (eg rc.local)

You haven't solved the race condition.

A new job can start after your job exits and before the next healthcheck run.

A new job can start between the "down" bit is set and before the cron job.

Also, setting the node start to "down" is problematic when using maui because
is hard-coded to kill jobs that have "down" nodes for 5 minutes.

Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

See the Dishonor Roll at http://www.californiansagainsthate.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081209/ff41e815/attachment.bin

More information about the torqueusers mailing list