[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted

Douglas Needham dneedham at cmu.edu
Thu Dec 17 07:12:59 MST 2009


On Mon, 2009-12-14 at 10:37 -0800, Garrick Staples wrote:
> That's what I do.  The init script leaves a /.autopbserror file around (just
> like the autofsck mechanism).
> 
> If found on boot, the node has been ungracefully rebooted and the node is
> marked offline.
> 
> I like making sure that pbs_mom is at least _started_ on boot to allow old jobs
> the chance to exit.

I like that, and think it could form the basis of a good overall
framework.  Have it so that a file is created when the node comes up to
some point (say even in the rc script for pbs_mom), and remove the file
during a shutdown.



More information about the torqueusers mailing list