[torqueusers] Long hostnames on large clusters causing problems in
torque.
Roy Dragseth
roy.dragseth at cc.uit.no
Wed Dec 5 13:56:21 MST 2007
The default Rocks cluster setup use names like compute-x-y.local for the
compute nodes. This seems to cause problems in torque when one wants to run
a large job. The queing system becomes unusable when a user submit a large
job, I have tried 4096 cpus, with this naming convention.
If I submit a 4096 cpu job then this is what qstat shows:
# qstat
qstat: End of File
Of course the quick fix is to shorten the hostnames and fortunately Rocks have
shortname aliases of the form cx-y. Using this convention in the nodes file
makes the 4096 cpu job run fine, but with the current growth of the cluster
sizes it will not take long before even short-named clusters run into the
same problem.
r.
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: royd at cc.uit.no
More information about the torqueusers
mailing list