[torqueusers] RE: job can't be killed by mom

Tina Declerck tinad at nersc.gov
Wed Oct 29 13:01:39 MDT 2008


Here is some additional data.

I see the same type of output that the mom thinks there is one or more  
runing tasks found.

Here is what the momctl reports:

job[567126.jacin03-m.nersc.gov]  state=EXITING  sidlist=8183

However, there is no process a PID of 8183:
ps -elf | grep 8183
4 S root      6976  6399  0  77   0 -   644 pipe_w 11:46 pts/0     
00:00:00 grep 8183

Where does the mom look for active processes?

Thank you for any assistance,
Tina Declerck
tinad at nersc.gov



 > Recently several jobs have not been able to be killed.  In one case I
 > see the following in one of the mom_logs:
 >
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m.nersc.gov;received request 'SIGNAL_TASK' for job 566567.jacin03-m   
from 10.1.60.237:1023
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m;im_request: SIGNAL_TASK 566567.jacin03-m.nersc.gov from node 0 task  
3548 signal 9
 > 10/20/2008 12:27:00;0002;   pbs_mom;Svr;im_request;connect from  
10.1.60.237:1023
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m.nersc.gov;received request 'SIGNAL_TASK' for job 566567.jacin03-m  
from 10.1.60.237:1023
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m.nersc.gov;im_request: SIGNAL_TASK 566567.jacin03-m from node 0 task  
3549 signal 9
 > 10/20/2008 12:27:00;0002;   pbs_mom;Svr;im_request;connect from  
10.1.60.237:1023
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m.nersc.gov;received request 'KILL_JOB' for job 566567.jacin03-m from  
10.1.60.237:1023
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;kill_job;im_request:  
sending signal 9, "KILL" to job 566567.jacin03-m.nersc.gov, reason:  
kill_job message received
 > 10/20/2008 12:27:00;0080;   pbs_mom;Svr;scan_for_exiting;searching  
for exiting jobs
 > 10/20/2008 12:27:00;0008;   pbs_mom;Job;566567.jacin03- 
m.nersc.gov;one or more running tasks found - no obit sent
 > 10/20/2008 12:27:24;0002;   pbs_mom;n/





More information about the torqueusers mailing list