[torqueusers] Torque on 1000 nodes ?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Jul 1 01:43:20 MDT 2005

Hi Garrick,

Thanks a lot.  We have two big academic clusters in Denmark that
really need this information !

> These questions would have been a lot more interesting back in the OpenPBS
> days :)

I quite agree.  I started to use OpenPBS in late 1999 on our
first large Alpha cluster (http://dcwww.camp.dtu.dk/valhal.html)
so I know about the weaknesses of OpenPBS :-)

> I can personally attest to Torque working just fine on 1700 nodes, whereas the
> old OpenPBS code started having problems at 256 nodes.  

This is crucial information to us.  Thanks a lot !

> Overall, it's lots of jobs that are a harder problem.  Fortunately we've had
> recent improvements in that area.  I can now have 8 thousands queued jobs and a
> few hundred running jobs without a problem.

We typically have 100-200 jobs running, and 3 times that queued.
With PBSPro 5.4.2 that's no sweat at all.  However, I recently
found out that the Maui scheduler has a hard-coded limit of 4096 jobs,
as you described.

What version of Torque do you use in order to include the "recent
improvements" alluded to ?  What are the troubles to look out for ?

With best regards,

Ole Holm Nielsen
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark

More information about the torqueusers mailing list