[torqueusers] Long hostnames on large clusters causing problems in torque.

Roy Dragseth roy.dragseth at cc.uit.no
Wed Dec 5 13:56:21 MST 2007


The default Rocks cluster setup use names like compute-x-y.local for the 
compute nodes.  This seems to cause problems in torque when one wants to run 
a large job.  The queing system becomes unusable when a user submit a large 
job, I have tried 4096 cpus, with this naming convention.  

If I submit a 4096 cpu job then this is what qstat shows:

# qstat
qstat: End of File

Of course the quick fix is to shorten the hostnames and fortunately Rocks have 
shortname aliases of the form cx-y.  Using this convention in the nodes file 
makes the 4096 cpu job run fine, but with the current growth of the cluster 
sizes it will not take long before even short-named clusters run into the 
same problem.

r.

-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
              phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, Team Leader, High Performance Computing
         Direct call: +47 77 64 62 56. email: royd at cc.uit.no


More information about the torqueusers mailing list