[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted
Chris Samuel
csamuel at vpac.org
Thu Dec 17 13:07:59 MST 2009
----- "Garrick Staples" <garrick at usc.edu> wrote:
> That's what I do. The init script leaves
> a /.autopbserror file around (just like the
> autofsck mechanism).
>
> If found on boot, the node has been ungracefully
> rebooted and the node is marked offline.
It might be safer to invert the logic and only start
on the presence of a file created only on a clean
shutdown (and removed just before starting pbs_mom).
That way if you've been hit by filesystem corruption
and have lost the .autopbserror file you won't start
up thinking the system is OK.
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torqueusers
mailing list