[torqueusers] Torque not deleting job
aemerich at us.ibm.com
Mon Apr 23 07:26:24 MDT 2007
I just wanted to give some additional information on tests I ran:
1. I did try to restart the pbs_mom with a -r to see if it would remedy
the problem and it did not.
2. The "qsig -s 0 1160" only returned a '0' return code, but the server
still thought the process was there.
3. "qdel 1160" works to clear the job from the server
IBM Corporation - Rochester, MN
Office: 030-3 F305
Office: (507) 253-5483
Cell: (507) 358-2999
aemerich at us.ibm.com
"Insanity: doing the same thing over and over again and expecting different
results." -Albert Einstein
<csamuel at vpac.org
Sent by: torqueusers at supercluster.org
es at supercluster.o
Re: [torqueusers] Torque not
On Sat, 21 Apr 2007, Adam Emerich wrote:
Thanks for the replies to myself and Garrick, the plot thickens!
> 1. root 2015 1 0 08:54 ? 00:00:02
> -> by default pbs_mom is not started with "-r" on our system
The pbs_mom manual page says about starting a pbs_mom with the -r option
If the -r option is used following a
reboot, process IDs (pids) may be reused
and MOM may kill a process that is not a
That could be a Bad Thing(tm). :-)
> 2. There is no entry in the server log for a failed epilogue or even a
> message that says the job is being terminated (note jobid is now 1160 as
> had to recreate the issue to get more details). The first failure in the
> log is due to another process being run that was eventually preempted by
> job 1160:
Interesting - anything in the pbs_mom logs on the node about that job ?
> 3. "qsig -s 0 1160" did not terminate the job from the server's point of
OK - now that's just plain bizarre - that is supposed to identify whether
not the child process exists for it and unless you've got a process ID
getting recycled (not beyond the realms of possibility) then it should
declare that process dead and clear up.
It certainly does on our RH 7.3, FC5 and SLES 9 clusters!
Long shot - do you have SE Linux enabled ? If so, can you disable it and
if it still happens ?
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
torqueusers mailing list
torqueusers at supercluster.org
More information about the torqueusers