[torqueusers] torque not scaling well

Chris Samuel csamuel at vpac.org
Wed Aug 1 18:55:51 MDT 2007


On Wed, 1 Aug 2007, Miles O'Neal wrote:

> But we still start seeing slowdows in job run rates somewhere between 1000
> and 1500 jobs queued, and once we get up around 3K jobs queued, forget it.

It is worth keeping in mind that this could be a Maui scaling problem, not 
Torque.  Maui is the part that is scanning the queues and trying to work out 
the best order to run jobs in based on your policy.

One suggestion based on what we do here is to look at limiting the number of 
running and idle jobs each user can have to make life fairer across the board 
and to stop people queue stuffing.

On our clusters we set something like:

USERCFG[DEFAULT]        MAXJOB=15,20
USERCFG[DEFAULT]        MAXIJOB=5

This means that any user can have a maximum of 15 running jobs if others are 
waiting to run and 20 if nobody else has anything queued (hah!).   In all 
cases a user can only have 5 jobs in the queue eligible to run and gain 
priority.

My guess is that this will cut down the load on Maui as once a user has hit 
his limit of jobs the rest should be able to be passed over (assuming that is 
how Maui does that internally!).

Best of luck!
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070802/1194bb84/attachment.bin


More information about the torqueusers mailing list