[torqueusers] Torque on 1000 nodes ?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jul 1 01:43:20 MDT 2005
Hi Garrick,
Thanks a lot. We have two big academic clusters in Denmark that
really need this information !
> These questions would have been a lot more interesting back in the OpenPBS
> days :)
I quite agree. I started to use OpenPBS in late 1999 on our
first large Alpha cluster (http://dcwww.camp.dtu.dk/valhal.html)
so I know about the weaknesses of OpenPBS :-)
> I can personally attest to Torque working just fine on 1700 nodes, whereas the
> old OpenPBS code started having problems at 256 nodes.
This is crucial information to us. Thanks a lot !
> Overall, it's lots of jobs that are a harder problem. Fortunately we've had
> recent improvements in that area. I can now have 8 thousands queued jobs and a
> few hundred running jobs without a problem.
We typically have 100-200 jobs running, and 3 times that queued.
With PBSPro 5.4.2 that's no sweat at all. However, I recently
found out that the Maui scheduler has a hard-coded limit of 4096 jobs,
as you described.
What version of Torque do you use in order to include the "recent
improvements" alluded to ? What are the troubles to look out for ?
With best regards,
Ole
Ole Holm Nielsen
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
More information about the torqueusers
mailing list