[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted

Chris Samuel csamuel at vpac.org
Mon Dec 14 20:05:12 MST 2009


----- "Jerry Smith" <jdsmit at sandia.gov> wrote:

> Now we have a startup item that does a few checks
> ( filesystem availability, some OS checks etc )
> that starts the pbs_mom at boot time if all tests
> pass.

You can set up your pbs_mom to do this itself, see
the node_check_script, node_check_interval, etc options
and the "HEALTH CHECK" section in the pbs_mom manual page.

We run Moab so it spots the errors reported in the
pbs_mom status messages and marks nodes as down
in its internal tables.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list