[torquedev] [PATCH] Change pbs_mom to set RLIMIT_AS instead of RLIMIT_DATA for mem/pmem limits.

Chris Samuel csamuel at vpac.org
Sun Jan 11 16:08:39 MST 2009


----- "Åke Sandgren" <ake.sandgren at hpc2n.umu.se> wrote:

> It's not really broken in Torque. It still sets _RSS for pmem but it
> also sets DATA and STACK for some reason.

I'd argue that the Linux specific code is broken for setting
RLIMIT_RSS as it's never been enforced (asides from calls to
madvise(MADV_WILLNEED) for 2.4 kernels up to 2.4.29 according
to the getrlimits(3) man page).

If pbs_mom is going to set something it should be something
that is implemented, or not set anything at all.

My thinking at the moment is that it shouldn't set limits for
(p)mem currently to at least make the situation plain to people
that the kernel cannot help here.

> There might be a way to solve this if the kernel starts using
> memorysets (like cpusets) but i haven't seen any real plans yet.

The cgroups work is interesting, but unfortunately you can't
mount both a cpuset and a cgroup VFS at the same time, and the
filesystem interface to cgroups differs from cpusets.

So we'll need to duplicate the code to handle it.  I think
the best start there will be to abstract the current cpuset
work into its own set of functions and then implement cgroup
equivalents.

The nice thing about cgroups is that with current kernels you
can include controls for access to devices (CONFIG_CGROUP_DEVICE),
which with GPGPU coming along would permit you to have functional
controls to ensure that only a program that requests access to
the GPU can access it.

It also gives nice memory stats in /dev/cgroup/$group/memory.stat,
including the RSS usage, etc.   The downside is that I can see at
least two different ways to set memory limits. :-(

> To make things more complicated there are situations where one would
> like to have different pvmem limits on different nodes in a job.
> i.e. nodes=1+2:ppn=8 with pvmem=16g+16:2g meaning 1 node with 16g and
> 2 nodes with 8 cores each with 2g.
> 
> So this resource limit setup really needs a good overhaul.

Run away! ;-)

But yes, I agree, more flexibility with these would be nice.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torquedev mailing list