[torqueusers] torque not scaling well
csamuel at vpac.org
Wed Aug 1 18:55:51 MDT 2007
On Wed, 1 Aug 2007, Miles O'Neal wrote:
> But we still start seeing slowdows in job run rates somewhere between 1000
> and 1500 jobs queued, and once we get up around 3K jobs queued, forget it.
It is worth keeping in mind that this could be a Maui scaling problem, not
Torque. Maui is the part that is scanning the queues and trying to work out
the best order to run jobs in based on your policy.
One suggestion based on what we do here is to look at limiting the number of
running and idle jobs each user can have to make life fairer across the board
and to stop people queue stuffing.
On our clusters we set something like:
This means that any user can have a maximum of 15 running jobs if others are
waiting to run and 20 if nobody else has anything queued (hah!). In all
cases a user can only have 5 jobs in the queue eligible to run and gain
My guess is that this will cut down the load on Maui as once a user has hit
his limit of jobs the rest should be able to be passed over (assuming that is
how Maui does that internally!).
Best of luck!
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070802/1194bb84/attachment.bin
More information about the torqueusers