[torqueusers] Torque on 1000 nodes ?

Garrick Staples garrick at usc.edu
Thu Jun 30 14:20:52 MDT 2005


On Wed, Jun 29, 2005 at 12:33:34PM +0200, Ole Holm Nielsen alleged:
> We're considering whether to move our 900+ node Linux cluster to
> the Torque resource manager.  However, we're unsure if Torque
> will work reliably on a cluster with this many nodes, since
> there may be all sorts of resource limits when the server
> has to communicate with ~1000 nodes.  The Torque page says
> that it scales above 2500 nodes, but I'd be interested in
> real production experiences.  My questions are:
> 
> 1. Can anyone recommend for or against Torque on large clusters ?

These questions would have been a lot more interesting back in the OpenPBS
days :)

I can personally attest to Torque working just fine on 1700 nodes, whereas the
old OpenPBS code started having problems at 256 nodes.  

Overall, it's lots of jobs that are a harder problem.  Fortunately we've had
recent improvements in that area.  I can now have 8 thousands queued jobs and a
few hundred running jobs without a problem.

 
> 2. What special tweaking must be done on large clusters ?

These aren't necessary, but keeps things running smoothly for me when thousands
of jobs are submitted.

These slow things down a wee bit.
  set server node_ping_rate = 300
  set server node_check_rate = 600
  set server tcp_timeout = 6

These keep things responding well when thousands of jobs are submitted.
  set server job_stat_rate = 45
  set server poll_jobs = True

Both pbs_server and maui have the ability to trigger a scheduling iteration at
regular intervals.  I think most people have maui "drive" the scheduling
iterations with an RMPOLLINTERNAL of 1 to 2 minutes.  I find it better to have
pbs_server drive it because it's iteration timeout resets when a job is
submitted (which triggers an iteration); and it runs better when thousands of
jobs are submitted.

  set server scheduler_iteration = 60  (1 minute)
  RMPOLLINTERVAL        00:60:00 (in maui.cfg)  (1 hour)

 
> 3. Does the Maui scheduler work reliably with Torque ?

Maui's limits are well understood and documented:
http://clusterresources.com/products/maui/docs/a.ddevelopment.shtml

I bump up these when building maui:
perl -pi -e 's/^#define MMAX_JOB .*/#define MMAX_JOB 8192/' include/msched.h
perl -pi -e 's/^#define MAX_MJOB .*/#define MAX_MJOB 8192/' include/msched.h
perl -pi -e 's/^#define MAX_MCLASS  .*/#define MAX_MCLASS  32/' include/msched-common.h

(I think the docs are wrong regarding MMAX_JOB and MAX_MJOB)


> FYI, our cluster has fairly fast Pentium-4 nodes and
> Gigabit/100Mbit Ethernet (no Myrinet or other custom networks).
> The homepage is http://www.dcsc.dtu.dk/English/Niflheim.aspx

We have a mix of 32bit Xeons, 64bit Xeons, 32bit Opterons, 64bit Opterons, and
PIIIs, with and without Myrinet.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050630/d4c9813e/attachment.bin


More information about the torqueusers mailing list