[torqueusers] Help with Torque 4.2.4 - Nodes O.K., but jobs 'Q' and error 15010 on qrun

David Beer dbeer at adaptivecomputing.com
Mon Aug 12 09:38:34 MDT 2013


I would look at the logs of the node that the job was sent to in order to
discover why it didn't work.


On Fri, Aug 9, 2013 at 9:07 PM, João Rodrigues <anaryin at gmail.com> wrote:

> Dear all,
>
> I just installed torque 4.2.4 from scratch on a CentOS cluster (ROCKS) I'm
> working on. I followed the instructions in the manual<http://docs.adaptivecomputing.com/torque/help.htm>
> .
>
> The output of running 'pbsnodes -a' is the following:
>
> compute-0-14.local
>      state = free
>      np = 24
>      ntype = cluster
>      status =
> rectime=1376103766,varattr=,jobs=,state=free,netload=44353668,gres=,loadave=0.00,ncpus=24,physmem=37140756kb,availmem=37454964kb,totmem=38164748kb,idletime=86646,nusers=0,nsessions=0,uname=Linux
> compute-0-14.local 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64,opsys=linux
>      mom_service_port = 15002
>      mom_manager_port = 15003
>
> When I try to submit a job it shows up in 'qstat' but as Queued. Issuing
> 'qrun' produces the following error message:
>
> qrun: Execution server rejected request MSG=cannot send job to mom,
> state=TRNOUT 3.<redacted.host.name>
>
> Issuing 'tracejob' to see what's up gives this in return:
>
> 08/09/2013 17:50:26  S    enqueuing into batch, state 1 hop 1
> 08/09/2013 17:50:26  A    queue=batch
> 08/09/2013 17:50:39  S    Job Run at request of root@<redacted.host.name>
> 08/09/2013 17:50:39  S    send of job to compute-0-4.local failed error =
> 15010
> 08/09/2013 17:50:39  S    unable to run job, MOM rejected/rc=-1
> 08/09/2013 17:50:39  S    unable to run job, send to MOM '3232238330'
> failed
>
> Can anyone offer a hint of what might be going on? Google doesn't know
> about that TRNOUT state nor about something similar.
>
> Cheers,
>
> João
>
> Disclaimer: I'm not a sysadmin nor IT guy, but I can read.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130812/194c86a0/attachment.html 


More information about the torqueusers mailing list