[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted
Chris Samuel
csamuel at vpac.org
Mon Dec 14 20:05:12 MST 2009
----- "Jerry Smith" <jdsmit at sandia.gov> wrote:
> Now we have a startup item that does a few checks
> ( filesystem availability, some OS checks etc )
> that starts the pbs_mom at boot time if all tests
> pass.
You can set up your pbs_mom to do this itself, see
the node_check_script, node_check_interval, etc options
and the "HEALTH CHECK" section in the pbs_mom manual page.
We run Moab so it spots the errors reported in the
pbs_mom status messages and marks nodes as down
in its internal tables.
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torqueusers
mailing list