[torqueusers] limit the number of jobs a user can submit

Martin Siegert siegert at sfu.ca
Mon Oct 3 11:07:11 MDT 2011


Hi Gareth,

On Mon, Oct 03, 2011 at 10:08:28PM +1100, Gareth.Williams at csiro.au wrote:
> > -----Original Message-----
> > From: Martin Siegert [mailto:siegert at sfu.ca]
> > Sent: Saturday, 1 October 2011 4:44 AM
> > To: Torque Users Mailing List
> > Subject: [torqueusers] limit the number of jobs a user can submit
> > 
> > Hi,
> > 
> > I know this has been discussed before, but I believe an important
> > aspect has been overlooked:
> > 
> > Moab has a limit on the number of jobs it can handle: the MAXJOB
> > parameter:
> > 
> > "Specifies the maximum number of simultaneous jobs which can be
> > evaluated
> > by the scheduler. If additional jobs are submitted to the resource
> > manager,
> > Moab will ignore these jobs until previously submitted jobs complete."
> > 
> > This allows for a trivial denial-of-service attack:
> > Simply submit a job array with at least MAXJOB+1 elements.
> > 
> > After that moab will disregard all further jobs for scheduling
> > even if they have a much higher priority than the array job elements.
> > 
> > I have not yet found a way of preventing this DoS attack.
> > The most logical solution to me would be to expand the
> > "max_user_queuable"
> > specification to allow for a server wide setting, not just a per
> > queue setting, i.e.,
> > 
> > set server max_user_queuable = 1000
> > 
> > Is that a feasible solution?
> > (and, yes, I'd like this limit to be in torque and not in moab because
> > the user will get an immediate response from qsub).
> > 
> > Cheers,
> > Martin
> > 
> > --
> > Martin Siegert
> > Simon Fraser University
> 
> Hi Martin,
> 
> We were bitten by this last week (for the first time ever that we know
> of) and increased MAXJOB.  I think using a combination of routing queues
> and execution queues with max_user_queueable should work. That way a user
> can only deny service to themself.  This solution is advocated here:
http://www.clusterresources.com/pipermail/torqueusers/2007-August/005922.html
> A recent query has more detail but unfortunately was unanswered:
http://www.clusterresources.com/pipermail/torqueusers/2011-September/013339.html
> I'd like to try this setup but don't want the dependency problems.

We've been bitten by this now at least three times and increased MAXJOB
already to 8192. I am very reluctant to go any higher: moab is already
using more than 8GB of address space (I suspect mostly because of our
complicated fstree structure) and increasing MAXJOB can only make this
worse.

I'd like to try the second setup - is there some documentation on what
route_held_jobs=False actually does? And, yes, the dependency problem is
ugly.

> Perhaps it can work for you if you increase MAXJOB (say 40k) and set
> max_user_queueable modestly (say 1000).  With those numbers you wouldn't
> get problems until 40 users submitted 1000 jobs each assuming they don't
> use multiple queues.

This could work - in all cases we have seen so far it was a single user
who took down the system by sibmitting huge array jobs. However, my limits
would have to be much smaller. The one time I tried this I was not sure
whether moab was treating jobs in routing queues correctly - I need to
test this again.

> Cheers,
> 
> Gareth
> 
> Note. If there were to be a server-wide max_user_queueable, that would
> imply that jobs could not go into routing queues as well as execution
> queues. 

Thanks for your suggestions. But my intent was indeed to not have jobs
sitting in routing queues - in our case jobs would actually pile up
in two routing queues

default -> exec1, exec2, route2
route2 -> exec3, exec4, exec5

Thus, jobs destined for exec3 would pile up in route2 AND default,
which can only make the dependency problem worse.

Thus, yes, I'd like to see a server-wide max_user_queueable for exactly
the reason that I do not want jobs to pile up in routing queues.

Cheers,
Martin


More information about the torqueusers mailing list