[torqueusers] Help with Torque 4.2.4 - Nodes O.K., but jobs 'Q' and error 15010 on qrun
dbeer at adaptivecomputing.com
Mon Aug 12 09:38:34 MDT 2013
I would look at the logs of the node that the job was sent to in order to
discover why it didn't work.
On Fri, Aug 9, 2013 at 9:07 PM, João Rodrigues <anaryin at gmail.com> wrote:
> Dear all,
> I just installed torque 4.2.4 from scratch on a CentOS cluster (ROCKS) I'm
> working on. I followed the instructions in the manual<http://docs.adaptivecomputing.com/torque/help.htm>
> The output of running 'pbsnodes -a' is the following:
> state = free
> np = 24
> ntype = cluster
> status =
> compute-0-14.local 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64,opsys=linux
> mom_service_port = 15002
> mom_manager_port = 15003
> When I try to submit a job it shows up in 'qstat' but as Queued. Issuing
> 'qrun' produces the following error message:
> qrun: Execution server rejected request MSG=cannot send job to mom,
> state=TRNOUT 3.<redacted.host.name>
> Issuing 'tracejob' to see what's up gives this in return:
> 08/09/2013 17:50:26 S enqueuing into batch, state 1 hop 1
> 08/09/2013 17:50:26 A queue=batch
> 08/09/2013 17:50:39 S Job Run at request of root@<redacted.host.name>
> 08/09/2013 17:50:39 S send of job to compute-0-4.local failed error =
> 08/09/2013 17:50:39 S unable to run job, MOM rejected/rc=-1
> 08/09/2013 17:50:39 S unable to run job, send to MOM '3232238330'
> Can anyone offer a hint of what might be going on? Google doesn't know
> about that TRNOUT state nor about something similar.
> Disclaimer: I'm not a sysadmin nor IT guy, but I can read.
> torqueusers mailing list
> torqueusers at supercluster.org
David Beer | Senior Software Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers