[torqueusers] jobs not clearing on crashed node

glen.beane at gmail.com glen.beane at gmail.com
Fri Feb 1 12:29:37 MST 2013



On Feb 1, 2013, at 12:17 PM, dbeer at adaptivecomputing.com wrote:

> Brian,
> 
> pbs_server doesn't consider a job completed until it gets the obit. For a mom after a restart, the mom should be started in a way that tells pbs_server that the jobs are no longer running, and this will clear up the jobs. 
> 
> If you have a diskless node then the mom won't know that it had jobs running before the reboot, so you'll need to run qdel -p on the jobs from that mom to clear them. 

I think I fixed this a long time ago (in 2.x).  I made it so that if the mom had no record of the job pbs_server would delete the job.  

If this does not happen now it is a bug.  It shouldn't require a qdel -p




> 
> David
> 
> On Feb 1, 2013, at 8:47 AM, "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:
> 
>> All,
>> 
>> Running torque 4.1.4 here (along with moab 7.2.0)
>> Issue: a node crashes that had several elements of an array job running on it.
>> It reboots and gets re-provisioned and comes back up.
>> pbsnodes still claims there are several jobs running on it.
>> If I run (on the node) pbs_mom purge, nothing changes.
>> If I restart pbs_server (which I hate doing since it resets Time Used on running jobs), nothing changes.
>> 
>> Shouldn't the jobs automatically either get restarted or cleared if a node reboots? I'm pretty sure torque used to do that...
>> 
>> 
>> 
>> Brian Andrus
>> ITACS/Research Computing
>> Naval Postgraduate School
>> Monterey, California
>> voice: 831-656-6238
>> 
>> 
>> 
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list