[torqueusers] Large cluster considerations

Caird, Andrew J acaird at umich.edu
Wed Feb 20 13:21:45 MST 2008

Hello all,

We periodically look at Appendix F of the Torque wiki, "Large Clusters
e_cluster_considerations) as our cluster grows.

A while back Garrick mentioned something about never using --disable-rpp
but that practice is encourage in Appendix F.

Are there other things in that Appendix that are bad ideas?
Particularly good ideas?

What other things are people doing with large ( > 500 node; > 1000 node)
clusters?  What qualifies as a "large cluster" from Torque's

As we grow, what should be be looking for?  Is there an acceptible level
of pbs_mom errors greater than zero?

Thanks for any advice or discussion.


