[torqueusers] Help with Torque 4.2.4 - Nodes O.K., but jobs 'Q' and error 15010 on qrun

João Rodrigues anaryin at gmail.com
Fri Aug 9 21:07:36 MDT 2013


Dear all,

I just installed torque 4.2.4 from scratch on a CentOS cluster (ROCKS) I'm
working on. I followed the instructions in the
manual<http://docs.adaptivecomputing.com/torque/help.htm>
.

The output of running 'pbsnodes -a' is the following:

compute-0-14.local
     state = free
     np = 24
     ntype = cluster
     status =
rectime=1376103766,varattr=,jobs=,state=free,netload=44353668,gres=,loadave=0.00,ncpus=24,physmem=37140756kb,availmem=37454964kb,totmem=38164748kb,idletime=86646,nusers=0,nsessions=0,uname=Linux
compute-0-14.local 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
2012 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

When I try to submit a job it shows up in 'qstat' but as Queued. Issuing
'qrun' produces the following error message:

qrun: Execution server rejected request MSG=cannot send job to mom,
state=TRNOUT 3.<redacted.host.name>

Issuing 'tracejob' to see what's up gives this in return:

08/09/2013 17:50:26  S    enqueuing into batch, state 1 hop 1
08/09/2013 17:50:26  A    queue=batch
08/09/2013 17:50:39  S    Job Run at request of root@<redacted.host.name>
08/09/2013 17:50:39  S    send of job to compute-0-4.local failed error =
15010
08/09/2013 17:50:39  S    unable to run job, MOM rejected/rc=-1
08/09/2013 17:50:39  S    unable to run job, send to MOM '3232238330' failed

Can anyone offer a hint of what might be going on? Google doesn't know
about that TRNOUT state nor about something similar.

Cheers,

João

Disclaimer: I'm not a sysadmin nor IT guy, but I can read.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130809/b0ed2a89/attachment-0001.html 


More information about the torqueusers mailing list