[torqueusers] jobs not clearing on crashed node

Andrus, Brian Contractor bdandrus at nps.edu
Fri Feb 1 08:47:47 MST 2013


Running torque 4.1.4 here (along with moab 7.2.0)
Issue: a node crashes that had several elements of an array job running on it.
It reboots and gets re-provisioned and comes back up.
pbsnodes still claims there are several jobs running on it.
If I run (on the node) pbs_mom purge, nothing changes.
If I restart pbs_server (which I hate doing since it resets Time Used on running jobs), nothing changes.

Shouldn't the jobs automatically either get restarted or cleared if a node reboots? I'm pretty sure torque used to do that...

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

More information about the torqueusers mailing list