[torqueusers] submitted jobs not running all nodes

Mark Moorcroft Mark.W.Moorcroft at nasa.gov
Tue Jun 18 14:23:54 MDT 2013


I am testing a beta of a torque/maui "roll" for Rocks clustering 
software. This is supposed to be torque 4.2.2 and I run on CentOS 6.x. I 
seem to be having the same issues. Everything appears to be distributed 
to both of my test nodes (according to the maui logs) but all the work 
runs on one node?


> I'm running a coupe of clusters one 64 node cluster and one 4 node
> cluster utilizing the default torque package for scheduling and
> everything else. When I try to submit a job that will utilize more than
> one node it appears that it will not use all of the nodes, but rather it
> stays on one node. When I run tracejob <job-id> or qstat -f <job-id> it
> shows that the nodes have been allocated to the job and everything
> appears to be fine. If I go to the nodes individually and run top or ps
> -ef the job will only appear on one node and use only the processors of
> that node.



More information about the torqueusers mailing list