[torqueusers] mom daemon crashes

Giorgio Padoan gpadoan at inogs.it
Tue Jul 8 08:03:23 MDT 2008


when the pbs server contact mom on the remote node the daemo on the node crash.
This is the log:

07/08/2008 15:34:59;0008;PBS_Server;Job;59.headnode.it;Job Modified at request of root at headnode.it
07/08/2008 15:34:59;0001;PBS_Server;Req;;Server could not connect to MOM
07/08/2008 15:34:59;0080;PBS_Server;Req;req_reject;Reject reply code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, 
from root at headnode.it
07/08/2008 15:43:28;0008;PBS_Server;Job;59.headnode.it;Job Run at request of root at headnode.it
07/08/2008 15:43:28;0008;PBS_Server;Job;59.headnode.it;send of job to sissi6 failed error = 15031
07/08/2008 15:43:28;0001;PBS_Server;Svr;PBS_Server;Batch protocol error (15031) in send_job, child failed in previous commit request 
for job 59.headnode.it
07/08/2008 15:43:28;0008;PBS_Server;Job;59.headnode.it;unable to run job, MOM rejected/rc=1
07/08/2008 15:43:28;0080;PBS_Server;Req;req_reject;Reject reply code=15041(Execution server rejected request MSG=cannot send job to 
mom, state=PRERUN), aux=0, type=RunJob, from root at headnode.it
07/08/2008 15:43:28;0040;PBS_Server;Svr;headnode.it;Scheduler sent command new
07/08/2008 15:46:58;0004;PBS_Server;Svr;check_nodes;node sissi6 not detected in 249 seconds, marking node down
07/08/2008 15:46:58;0004;PBS_Server;Svr;check_nodes;node sissi7 not detected in 291 seconds, marking node down

I have installaed:

on a cluster Linux Fedora 8 x86_64 [2.6.25]

Can you help me?

Thanks in advance.

giorgio padoan

Giorgio Padoan                              gpadoan [at] inogs.it
The computer whisperer.
GDL-SIEG Supporto informatico e grafica computerizzata
Istituto Nazionale di Oceanografia e Geofisica Sperimentale - OGS
Borgo Grotta Gigante 42/c                    PHONE +39 40 2140265
34010 - TRIESTE (ITALIA)                      FAX   +39 40 327521

More information about the torqueusers mailing list