[torquedev] memory limit enforcement by pbs_mom - REQUEST FOR FEEDBACK

Åke Sandgren ake.sandgren at hpc2n.umu.se
Tue Feb 7 04:51:57 MST 2006


On Mon, 2006-02-06 at 15:42 +0100, Åke Sandgren wrote:
> On Tue, 2006-01-31 at 13:18 -0700, Dave Jackson wrote:
> > Greetings,
> > 
> >   Currently, the pbs_mom enforces memory limits specified with '-l
> > pmem=X' but does not enforce memory limits specified with '-l mem=X'
> > This is confusing for some users.  I propose that we modify
> > mom_set_limits() to enforce stack and data segment limits if pmem is
> > specified or mem is specified and the job is serial.
> > 
> >   This should have the impact that serial jobs now have mem limits
> > enforced.  Are there any concerns with this change?
> 
> After having read linux/mom_mach.c a couple of times i would suggest
> that pxxx limits get enforced with setrlimit whenever the corresponding
> xxx limit has been set, since if any process exceeds limit xxx the mom
> should kill the job anyway.
> 
> Then we have the question of what (p)mem should really limit.
> As far as i know this could potentially be slightly different things on
> different archs depending on what is actually possible.
> 
> On linux the only thing you can poll from outside is rss which means mem
> should limit rss and nothing else. This would then mean that pmem
> shouldn't try to enforce anything (since the kernel doesn't enforce
> RLIMIT_RSS) and pmem and mem should be polled for toghether with
> walltime, cput and vmem. Then if any limit is requested RLIMIT_DATA and
> RLIMIT_STACK should be raised (but probably not lowered) to the limit.


Attached is a first try (not tested, not even compiled) of what such a
change could look like.

Comments?

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxmem_limit.patch
Type: text/x-patch
Size: 10233 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060207/09da6400/xxmem_limit-0001.bin


More information about the torquedev mailing list