[torquedev] [Bug 86] Implement transparent resource limits

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Wed Oct 6 23:58:15 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=86

--- Comment #7 from Eygene Ryabinkin <rea+maui at grid.kiae.ru> 2010-10-06 23:58:15 MDT ---
(In reply to comment #6)
> > No, that is the thing that I completely want to avoid: no scheduling decisions
> > must be made basing on the transparent resource limits (server/queue
> > configuration attribute leaf resource_limits) and job reject _is_ the
> > scheduling decision.  What I need is to say "If that job _in the process of its
> > execution_ exceeds the specified limit, kill it".  It is ulimit on steroids or
> > "MOM-powered per-queue ulimit over the Torque protocol" (tm).
> 
> Why would you want to do that?

Why would I want to do what?  Had you ever tuned ulimits on the machines, say,
via /etc/security/limits.conf?  I just need to enforce resource limits -- all I
want.

> That's super ineffective.

Please, explain your point.

> You will allow the job grow over the limit, but kill it when it happens?

Yes, and that is called limit enforcement.  By the way, that is how the law
enforcement works: prior to arresting someone, he should violate something, not
the other way round (in a perfect world, of course ;))

Once again: Grid jobs are coming without any clues (for the Torque) on their
memory requirements.
So, I know that by the SLA and some empirical knowldege, I should give no more
than 4gb of virtual memory; so I will enforce this limit: any job that takes
more vmem will be killed.

You might say that this is ineffective, that jobs shouldn't be let executed at
all, but I can't predict that the job will go over the limit at the time of its
submission: crime first, punishment second.

> > The real reason why I created that patch is that our Grid cluster was drowned
> > with the jobs that ate 15-25 Gb of virtual memory and, given that we mostly
> > have 8 slot machines, OOM killer was pretty busy on them; so busy that some
> > kernel threads weren't waked up for 3-4 minutes.
> 
> Well, why don't you limit the amount of the the memory in the first place?

Via what means?

> > But when I tried to use resources_max/resources_default, Maui started to
> > underfill our slots, because resources_max/resources_default are transformed to
> > the job requirements and not only enforced on the MOM side.  So, the codename
> > "transparent" was born ;))
> 
> Well, that's definitely a Maui configuration problem and has pretty much
> nothing with Torque.

I am sorry, but you're plain wrong.
Maui does what it should do: it evaluates job requirements and selects slots
based on them.
The problem is that I just don't want _administrator-set_ resource limits to be
treated as the job requirements.
If job additionally specify the requirements -- let it be, scheduler should
obey the requests (if they aren't higher than resources_max).

> Not a very good idea to fix a Maui configuration problem with a patch for Torque :-D

I am sorry for being a bit harsh, but given that you hadn't seen my Torque and
Maui configuration, you can't judge if there are some configuration problems in
it.

Once again: if job itself specifies the limit -- let it be, Maui should respect
it and choose the slot that fulfills the requirement.  But _not every job_ will
want, say 4gb of vmem, that't the problem.  Some of them will want only 1gb, so
enforcing scheduled to find the slots with 4gb of free vmem for such job --
that's ineffective, because I want all our job slots to be populated with
tasks.  I know that the average memory consumption for the job is 2gb, so I am
setting the 4gb cap to filter outrageous jobs.

If you have an idea how to do it with Torque/Maui combo without using my patch
and fulfilling the requirement of job being able to specify its own resource
requirements -- I am all ears.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list