[torqueusers] submitted jobs not running on all the requested nodes

tarelom at gmail.com tarelom at gmail.com
Thu Jun 6 10:20:03 MDT 2013


I'm running a coupe of clusters one 64 node cluster and one 5 node cluster
utilizing the default torque package for scheduling and everything else.
When I try to submit a job that will utilize more than one node it appears
that it will not use all of the nodes, but rather it stays on one node.
When I run tracejob <job-id> or qstat -f <job-id> it shows that the nodes
have been allocated to the job and everything appears to be fine. If I go
to the nodes individually They have the appropriate job files in the mom
directory, but if I run top or ps -ef the job will only appear on one node
and use only the processors of that node while not showing up in any of the
other nodes it has been set to use.

Does anyone have any idea what may be causing this behavior or what setting
I need to change in order to fix it?

Chris Bright
Computer Professional
Scientific Computing and Imaging Institute
University of Utah
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130606/f859837c/attachment.html 

More information about the torqueusers mailing list