[torquedev] memory limit enforcement by pbs_mom - REQUEST FOR FEEDBACK

Åke Sandgren ake.sandgren at hpc2n.umu.se
Wed Feb 1 00:01:06 MST 2006


On Tue, 2006-01-31 at 13:18 -0700, Dave Jackson wrote:
> Greetings,
> 
>   Currently, the pbs_mom enforces memory limits specified with '-l
> pmem=X' but does not enforce memory limits specified with '-l mem=X'
> This is confusing for some users.  I propose that we modify
> mom_set_limits() to enforce stack and data segment limits if pmem is
> specified or mem is specified and the job is serial.
> 
>   This should have the impact that serial jobs now have mem limits
> enforced.  Are there any concerns with this change?

First of all the documentation needs to define EXACTLY what the
different limits are supposed to do. Today it is unclear if (v)mem is
meant to limit the whole job (sum of all nodes/tasks) or just all
processes on one node or all processes belonging to one task. And mem
isn't mentioned in at least pbs_resources_linux, only vmem and p(v)mem.

Make sure you do the "right" thing. I.e. don't touch stack_limit, i.e.
RLIMIT_STACK, that will be taken care of by the system anyway and should
preferably be left at whatever the system default is.

Also remember that at least on linux the kernel only handles RLIMIT_AS
(vmem) in any real sense, RLIMIT_DATA is only checked in very special
cases and RLIMIT_RSS is not used at all.

So... if you want to enforce something else then RLIMIT_AS (pvmem as i
interpret things anyway) you would have to do all of it in pbs_mom "by
hand".

Is pbs_mom really enforcing any limits "by hand" today? I was going to
check the code but haven't gotten around to it...




More information about the torquedev mailing list