[torquedev] memory limit enforcement by pbs_mom - REQUEST FORFEEDBACK

David B Jackson jacksond at clusterresources.com
Wed Feb 1 00:08:21 MST 2006


  I think you may have revealed too much knowledge about the Linux kernel
and thus nominated yourself to update the docs in the WIKI! :)  Would
you be able to look at the mom_set_limits() call in the latest TORQUE
2.1.0 snapshot.  Based on this, could you update the online docs in the
WIKI and recommend changes to be made to the code?

  We can assist in whatever way would be most helpful.

  Much appreciated!


> On Tue, 2006-01-31 at 13:18 -0700, Dave Jackson wrote:
>> Greetings,
>>   Currently, the pbs_mom enforces memory limits specified with '-l
>> pmem=X' but does not enforce memory limits specified with '-l mem=X'
>> This is confusing for some users.  I propose that we modify
>> mom_set_limits() to enforce stack and data segment limits if pmem is
>> specified or mem is specified and the job is serial.
>>   This should have the impact that serial jobs now have mem limits
>> enforced.  Are there any concerns with this change?
> First of all the documentation needs to define EXACTLY what the
> different limits are supposed to do. Today it is unclear if (v)mem is
> meant to limit the whole job (sum of all nodes/tasks) or just all
> processes on one node or all processes belonging to one task. And mem
> isn't mentioned in at least pbs_resources_linux, only vmem and p(v)mem.
> Make sure you do the "right" thing. I.e. don't touch stack_limit, i.e.
> RLIMIT_STACK, that will be taken care of by the system anyway and should
> preferably be left at whatever the system default is.
> Also remember that at least on linux the kernel only handles RLIMIT_AS
> (vmem) in any real sense, RLIMIT_DATA is only checked in very special
> cases and RLIMIT_RSS is not used at all.
> So... if you want to enforce something else then RLIMIT_AS (pvmem as i
> interpret things anyway) you would have to do all of it in pbs_mom "by
> hand".
> Is pbs_mom really enforcing any limits "by hand" today? I was going to
> check the code but haven't gotten around to it...
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev

More information about the torquedev mailing list