[torqueusers] Re: [Mauiusers] maui memory consumption with
garrick at clusterresources.com
Tue Aug 8 11:34:57 MDT 2006
On Tue, Aug 08, 2006 at 10:39:59AM -0700, Sam Rash alleged:
> Ooh, I may have missed something: we regularly hit maui with 5k jobs
> daily--the default for MMAX_JOB is 4096. What does this actually mean?
> Only 4096 will be considered by maui at a time? (ie, left in the RM)
Correct. Any jobs after the max are simply ignored.
When you think about it, since 4096 jobs can't actually run (since you
don't actually have that many nodes), there isn't much need for maui to
read in more jobs.
When I came across this problem on my own cluster, I found that the "bad
user" would always pass any max jobs that I built into maui. A strategy
to deal with this is to use routing queues in TORQUE...
set server default_queue = default
create queue default queue_type=R,route_destinations=mainexec
create queue mainexec queue_type=E,max_queuable=1000
I have a fairly deeply nested set of routing queues for different groups
of users, each with different max resources, acls, max_queuables, and
max_user_runs. The idea is to prevent a user in one group to swamp maui
and prevent other queues from executing.
More information about the torqueusers