[torquedev] [PATCH] Change pbs_mom to set RLIMIT_AS instead of RLIMIT_DATA for mem/pmem limits.

Michael Barnes Michael.Barnes at jlab.org
Mon Jan 12 06:57:04 MST 2009


On Mon, Jan 12, 2009 at 09:52:34AM +1100, David Singleton wrote:
> Chris Samuel wrote:
> >----- "David Singleton" <David.Singleton at anu.edu.au> wrote:
> >
> >>How did RLIMIT_DATA get into limiting "mem"?
> >
> >I presume because it was the only resource limit that
> >affected malloc() under the Linux kernel. :-(
> >
> 
> Unfortunately that is imposing a form of pvmem limit, not a
> pmem limit.  Memory allocation should be limited by vmem/pvmem
> requests/limits, not by mem/pmem requests/limits.
> 
> Currently mem limits can only be imposed by MOM monitoring.

Let me chime in.

As far as I understand it, the difference between mem and vmem is that
vmem is enforced by the mom, and mem is only used for sceduling purposes
(at least with the Maui scheduler).  At our site, this works well.

As far as the recent changes between RLIMIT_DATA and RLIMIT_AS, to me
its a cointoss. Maybe there should be a compiletime/runtime option for
this. I don't know. I do know that things like java have a huge VM
footprint, which I would guess is from mmap()ing vs malloc()ing data,
but most other apps have more similar mem/vmem ratios.

Also, unless this has been fixed in subsequent versions, in 2.1.10,
the file limit is broken for values on systems where an int is 32bits
and the user requests a file limit over INT_MAX. Also, the mom tries
to impose this limit, which is kindof useless because an application
can open up an infinite number of files sequentially that are under
the limit, and close each file, and it will never hit its file limit.
We've ripped out the enforcement of the file limit, and only use it for
scheduling (again this is with the maui scheduler).

In src/resmom/linux/mom_mach.c, the getsize() routine returns an int
Actually, this integer value also seems to be used for vmem, pvmem, etc.

If this hasn't already been fixed, I think its worthwhile to make
getsize() return either an unsigned long or some other datatype that can
handle the limits that it can parse from the user's input.

Regards,

-mb

-- 
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------


More information about the torquedev mailing list