[torqueusers] Torque MOMs terminating jobs immediately after
starting
Prakash Velayutham
prakash.velayutham at cchmc.org
Mon Mar 2 10:07:15 MST 2009
Hello,
I am running Torque 2.3.6 (in --ha mode if that makes any difference).
Only randomly I am seeing the following behaviour with the MOMs.
A job would be accepted by the server and scheduled by the Moab
(5.3.1) scheduler, but then after the job is shipped to the compute
node, it gets terminated right away by the node with the following in
its log.
03/02/2009 12:00:26;0001; pbs_mom;Job;TMomFinalizeJob3;job
5619.bmiclustersvcd1.cchmc.org started, pid = 22521
03/02/2009 12:00:26;0008; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;Job Modified at request of PBS_Server at bmiclustersvcd1.cchmc.org
03/02/2009 12:00:26;0008; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;kill_task: killing pid 22863 task 1
gracefully with sig 15
03/02/2009 12:00:26;0080; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;scan_for_terminated: job
5619.bmiclustersvcd1.cchmc.org task 1 terminated, sid=22521
03/02/2009 12:00:26;0008; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;job was terminated
03/02/2009 12:00:26;0080; pbs_mom;Svr;preobit_reply;top of
preobit_reply
03/02/2009 12:00:26;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/
decode_DIS_replySvr worked, top of while loop
03/02/2009 12:00:26;0080; pbs_mom;Svr;preobit_reply;in while loop,
no error from job stat
03/02/2009 12:00:26;0008; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;checking job post-processing routine
03/02/2009 12:00:26;0080; pbs_mom;Job;
5619.bmiclustersvcd1.cchmc.org;obit sent to server
As I mentioned earlier, this is very random. It does this even with
interactive jobs, so it is not something to do with the batch scripts.
See below:
velge9 at bmiclusterd1:~> qsub -I -lnodes=bmi-xeon3-04
qsub: waiting for job 5620.bmiclustersvcd1.cchmc.org to start
qsub: job 5620.bmiclustersvcd1.cchmc.org ready
qsub: job 5620.bmiclustersvcd1.cchmc.org completed
velge9 at bmiclusterd1:~>
I am baffled. Any help appreciated.
Thanks,
Prakash
More information about the torqueusers
mailing list