[torqueusers] can't delete a job from the queue without restarting or SIGHUP'ing all pbs_mom processes on the node where the job ran

Sabuj Pattanayek sabujp at gmail.com
Thu Apr 25 12:58:31 MDT 2013

Hi all,

Anyone know what might be causing a job that has completed or been
qdel'd to not be removed from the output of qstat? For example here's
one such job, here in the mom_logs it shows that it was terminated :

# grep -R -i 4895385 *
mom_logs/20130425:04/25/2013 00:48:03;0001;
pbs_mom;Job;TMomFinalizeJob3;job 4895385.piranha started, pid = 25069
mom_logs/20130425:04/25/2013 01:33:28;0080;
pbs_mom;Job;4895385.piranha;scan_for_terminated: job 4895385.piranha
task 1 terminated, sid=25069
mom_logs/20130425:04/25/2013 01:33:28;0008;
pbs_mom;Job;4895385.piranha;job was terminated

and on that node there are two pbs_mom processes :

root      3180  0.0  0.1  45096 28116 ?        SLsl Apr10   6:21
root     25991  0.0  0.1  45020 25956 ?        S    01:33   0:00

If I killall -1 pbs_mom, the more recently started pbs_mom (from 1:33
AM today) will terminate and then the job will be removed from
pbs_server's qstat output. I saw this :


..and I guess the pbs_mom is somehow not sending pbs_server the
obituary, but why not? No processes that belong to the user of the job
are still running on the node.


More information about the torqueusers mailing list