[torqueusers] problem Server could not connect to MOM
Daniel Andrzejewski
andrzeje at cs.utk.edu
Fri Jul 25 12:49:37 MDT 2008
Hi,
I have 1 head node and 4 compute nodes, Torque 2.3.1 and CentOS 5.1.
When I submit an interactive job it hangs.
How can I trace the problem?
andrzeje:boba-head ~> strace qsub -I -l nodes=2:ppn=2
execve("/usr/local/bin/qsub", ["qsub", "-I", "-l", "nodes=2:ppn=2"], [/* 35 vars */]) = 0
.
.
.
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e8a000
write(1, "qsub: waiting for job 78.boba-he"..., 56qsub: waiting for job
78.boba-head.sinrg.local to start
) = 56
select(1024, [3], NULL, NULL, {30, 0}
-bash-3.1# tail -f /sw/var/torque/server_logs/20080725
07/25/2008 14:45:56;0040;PBS_Server;Svr;boba-head.sinrg.local;Scheduler sent command new
07/25/2008 14:45:57;0008;PBS_Server;Job;78.boba-head.sinrg.local;Job Modified at request
of root at boba-head.sinrg.local
07/25/2008 14:45:57;0001;PBS_Server;Req;;Server could not connect to MOM
07/25/2008 14:45:57;0080;PBS_Server;Req;req_reject;Reject reply code=15070(Server could
not connect to MOM), aux=0, type=ModifyJob, from root at boba-head.sinrg.local
07/25/2008 14:46:28;0008;PBS_Server;Job;78.boba-head.sinrg.local;Job Modified at request
of root at boba-head.sinrg.local
07/25/2008 14:46:28;0001;PBS_Server;Req;;Server could not connect to MOM
07/25/2008 14:46:28;0080;PBS_Server;Req;req_reject;Reject reply code=15070(Server could
not connect to MOM), aux=0, type=ModifyJob, from root at boba-head.sinrg.local
07/25/2008 14:46:59;0008;PBS_Server;Job;78.boba-head.sinrg.local;Job Modified at request
of root at boba-head.sinrg.local
07/25/2008 14:46:59;0001;PBS_Server;Req;;Server could not connect to MOM
07/25/2008 14:46:59;0080;PBS_Server;Req;req_reject;Reject reply code=15070(Server could
not connect to MOM), aux=0, type=ModifyJob, from root at boba-head.sinrg.local
-bash-3.1# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 0 of 8 Processors Active (0.00%)
0 of 4 Nodes Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
76 andrzeje Idle 4 4:00:00 Fri Jul 25 14:38:39
1 Idle Job
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 1 Active Jobs: 0 Idle Jobs: 1 Blocked Jobs: 0
-bash-3.1# dsh -g boba ps -eaf | grep pbs
boba1: root 9491 1 0 14:03 ? 00:00:00 /usr/local/sbin/pbs_mom
boba2: root 6733 1 0 14:03 ? 00:00:00 /usr/local/sbin/pbs_mom
boba3: root 6941 1 0 14:03 ? 00:00:00 /usr/local/sbin/pbs_mom
boba4: root 4040 1 0 14:17 ? 00:00:00 /usr/local/sbin/pbs_mom
-bash-3.1# ps -eaf | grep pbs
root 31789 1 0 14:03 ? 00:00:00 /usr/local/sbin/pbs_server
root 31987 31211 0 14:41 pts/2 00:00:00 grep pbs
-bash-3.1# ps -eaf | grep maui
root 31792 1 0 14:03 ? 00:00:00 /usr/local/sbin/maui
root 31989 31211 0 14:41 pts/2 00:00:00 grep maui
Thanks,
Daniel
--
Daniel Andrzejewski
student IT Administrator
Elec Engr & Comp Science
University of Tennessee
(865) 974 - 4388 (work)
"Investment in knowledge always pays the best interest" Benjamin Franklin
--
More information about the torqueusers
mailing list