[torquedev] [Bug 86] Implement transparent resource limits

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Oct 7 01:51:28 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=86

--- Comment #8 from Simon Toth <SimonT at mail.muni.cz> 2010-10-07 01:51:27 MDT ---
(In reply to comment #7)
> (In reply to comment #6)
> > > No, that is the thing that I completely want to avoid: no scheduling decisions
> > > must be made basing on the transparent resource limits (server/queue
> > > configuration attribute leaf resource_limits) and job reject _is_ the
> > > scheduling decision.  What I need is to say "If that job _in the process of its
> > > execution_ exceeds the specified limit, kill it".  It is ulimit on steroids or
> > > "MOM-powered per-queue ulimit over the Torque protocol" (tm).
> > 
> > Why would you want to do that?
> 
> Why would I want to do what?  Had you ever tuned ulimits on the machines, say,
> via /etc/security/limits.conf?  I just need to enforce resource limits -- all I
> want.

There are tons of possible approaches. Torque supports only ulimit as far as I
know.

> > That's super ineffective.
> 
> Please, explain your point.
> 
> > You will allow the job grow over the limit, but kill it when it happens?
> 
> Yes, and that is called limit enforcement.  By the way, that is how the law
> enforcement works: prior to arresting someone, he should violate something, not
> the other way round (in a perfect world, of course ;))

No, it definitely doesn't work this way. What you are doing is drawing an
invisible line (in place where you should build a fence) and shooting everyone
that crosses the line.

> Once again: Grid jobs are coming without any clues (for the Torque) on their
> memory requirements.

One of the problems, but OK, this one might not be solvable.

> So, I know that by the SLA and some empirical knowldege, I should give no more
> than 4gb of virtual memory; so I will enforce this limit: any job that takes
> more vmem will be killed.
> 
> You might say that this is ineffective, that jobs shouldn't be let executed at
> all, but I can't predict that the job will go over the limit at the time of its
> submission: crime first, punishment second.

First, jobs should declare that, second I'm not talking about submission, I'm
talking about limiting (not killing during runtime).

> > > The real reason why I created that patch is that our Grid cluster was drowned
> > > with the jobs that ate 15-25 Gb of virtual memory and, given that we mostly
> > > have 8 slot machines, OOM killer was pretty busy on them; so busy that some
> > > kernel threads weren't waked up for 3-4 minutes.
> > 
> > Well, why don't you limit the amount of the the memory in the first place?
> 
> Via what means?

Cgroups, ulimit, virtual machines... Millions of choices.

> > > But when I tried to use resources_max/resources_default, Maui started to
> > > underfill our slots, because resources_max/resources_default are transformed to
> > > the job requirements and not only enforced on the MOM side.  So, the codename
> > > "transparent" was born ;))
> > 
> > Well, that's definitely a Maui configuration problem and has pretty much
> > nothing with Torque.
> 
> I am sorry, but you're plain wrong.
> Maui does what it should do: it evaluates job requirements and selects slots
> based on them.
> The problem is that I just don't want _administrator-set_ resource limits to be
> treated as the job requirements.

And that's a configuration issue in Maui and not Torque.

> If job additionally specify the requirements -- let it be, scheduler should
> obey the requests (if they aren't higher than resources_max).

Again, nothing to do with Torque, pure Maui configuration.

> > Not a very good idea to fix a Maui configuration problem with a patch for Torque :-D
> 
> I am sorry for being a bit harsh, but given that you hadn't seen my Torque and
> Maui configuration, you can't judge if there are some configuration problems in
> it.
>
> Once again: if job itself specifies the limit -- let it be, Maui should respect
> it and choose the slot that fulfills the requirement.  But _not every job_ will
> want, say 4gb of vmem, that't the problem.  Some of them will want only 1gb, so
> enforcing scheduled to find the slots with 4gb of free vmem for such job --
> that's ineffective, because I want all our job slots to be populated with
> tasks.  I know that the average memory consumption for the job is 2gb, so I am
> setting the 4gb cap to filter outrageous jobs.

Why would any scheduler allocate 4GB for a job that requests 1GB?

> If you have an idea how to do it with Torque/Maui combo without using my patch
> and fulfilling the requirement of job being able to specify its own resource
> requirements -- I am all ears.

You should ask that in Maui mailing list, not here.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list