[torqueusers] torque/moab holding back nodes?

Steven A. DuChene linux-clusters at mindspring.com
Wed Apr 5 16:17:17 MDT 2006


On our new 128 node cluster we are attempting to do some acceptance testing
benchmarking and system burn-in. I am having issues trying to get torque to let
me use all the available nodes. I have tried submiting a 128 node job and I have
also tried submiting a BUNCH of smaller jobs. The 128 node job (hpl benchmarking
job in this case) tells me "job exceeds queue resource limits" when I try to submit
the job and submiting a large group of mixed smaller jobs (40 or so 8-way jobs
mixed with 40 2-way jobs, all hpl runs) always leaves 5 - 8 idle nodes sitting there
with idle jobs sitting in the queue.

Our queue and scheduler configuration is a VERY simple default config so I don't
understand why this is taking place. We have torque-2.0.0p7 and moab-4.5.0p0
installed on this cluster.

Any ideas or suggestions as to what I should be looking at to see why this is happening?
--
Steven A. DuChene


More information about the torqueusers mailing list