[torqueusers] question about the 'purging job without checking MOM'

Bowen Kan kanbw at ihep.ac.cn
Wed Feb 12 00:22:03 MST 2014


 

Hi All,

 

We are using tprque 2.5.5 and maui 2.3.6-p21. (I'm a learner).

 

Now we are meeting some problems that pbs_mom fails to clean the processes
created by some jobs.

 

It happens always with the same kind of jobs from the same users.

 

So if this happened, we can't see job from the pbs_server  ( qstat -an1 |
grep jobid ) , but this job still run in pbs_mom, occupying the cpu and
continuously produce the output file. 

 

So I want to figure out why the job will be 'purging job without checking
MOM' and why and when the server will purging job without checking MOM

 

Any help?

 

Thank you very much.

 

Job: 6303017.server.ihep.ac.cn

 

02/10/2014 09:53:41  S    enqueuing into hxmtq, state 1 hop 1

02/10/2014 09:53:41  S    Job Queued at request of
zhangjuan at login.ihep.ac.cn, owner = zhangjuan at login.ihep.ac.cn, job name =
proton_25.sh, queue =

                          hxmtq

02/10/2014 09:53:41  A    queue=hxmtq

02/10/2014 09:55:25  S    Job Modified at request of root at server.ihep.ac.cn

02/10/2014 09:55:25  S    Job Run at request of root at pbssrv.ihep.ac.cn

02/10/2014 09:55:25  S    Job Modified at request of root at pbssrv.ihep.ac.cn

02/10/2014 09:55:25  S    Not sending email: User does not want mail of this
type.

02/10/2014 09:55:25  A    user=zhangjuan group=hxmt jobname=proton_25.sh
queue=hxmtq ctime=1391997221 qtime=1391997221 etime=1391997221
start=1391997325

                          owner=zhangjuan at lxslc512.ihep.ac.cn
exec_host=hxmt048.ihep.ac.cn/0 Resource_List.cput=100:00:00

                          Resource_List.neednodes=hxmt048.ihep.ac.cn 

02/10/2014 11:15:59  S    enqueuing into hxmtq, state 4 hop 1

02/10/2014 11:30:35  S    enqueuing into hxmtq, state 4 hop 1

02/10/2014 14:54:06  S    enqueuing into hxmtq, state 4 hop 1

02/10/2014 18:36:34  S    enqueuing into hxmtq, state 4 hop 1

02/10/2014 23:00:08  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 04:40:32  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 05:00:35  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 05:06:06  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 09:15:05  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 10:30:35  S    enqueuing into hxmtq, state 4 hop 1

02/11/2014 18:50:08  S    purging job without checking MOM

02/11/2014 18:50:08  S    dequeuing from hxmtq, state RUNNING

 

Bowen Kan

====================================================================

Computing center,the Institute of High Energy Physics, CAS, China

Kan, Bowen                         Tel: (+86) 10 8823 6883

P.O. Box 918-7                       Fax: (+86) 10 8823 6839

Beijing 100049  P.R. China           Email: Bowen.Kan at ihep.ac.cn

===================================================================

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140212/56e50a78/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2747 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20140212/56e50a78/attachment.bin 


More information about the torqueusers mailing list