[torqueusers] Torque on 1000 nodes ?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Wed Jun 29 04:33:34 MDT 2005

We're considering whether to move our 900+ node Linux cluster to
the Torque resource manager.  However, we're unsure if Torque
will work reliably on a cluster with this many nodes, since
there may be all sorts of resource limits when the server
has to communicate with ~1000 nodes.  The Torque page says
that it scales above 2500 nodes, but I'd be interested in
real production experiences.  My questions are:

1. Can anyone recommend for or against Torque on large clusters ?

2. What special tweaking must be done on large clusters ?

3. Does the Maui scheduler work reliably with Torque ?

FYI, our cluster has fairly fast Pentium-4 nodes and
Gigabit/100Mbit Ethernet (no Myrinet or other custom networks).
The homepage is http://www.dcsc.dtu.dk/English/Niflheim.aspx

Ole Holm Nielsen
Department of Physics, Technical University of Denmark,

More information about the torqueusers mailing list