[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted

Chris Samuel csamuel at vpac.org
Thu Dec 17 13:07:59 MST 2009


----- "Garrick Staples" <garrick at usc.edu> wrote:

> That's what I do.  The init script leaves
> a /.autopbserror file around (just like the
> autofsck mechanism).
> 
> If found on boot, the node has been ungracefully
> rebooted and the node is marked offline.

It might be safer to invert the logic and only start
on the presence of a file created only on a clean
shutdown (and removed just before starting pbs_mom).

That way if you've been hit by filesystem corruption
and have lost the .autopbserror file you won't start
up thinking the system is OK.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list