[torqueusers] consensus on memory enforcement?

Åke Sandgren ake.sandgren at hpc2n.umu.se
Sat Jun 3 03:10:20 MDT 2006


On Fri, 2006-06-02 at 16:59 -0400, garrick at speculation.org wrote:
> I've got users that are abusing memory usage on Linux nodes and I'd
> like to clear this up in TORQUE.
> 
> What's the consensus on what changes should be made?  Linux should use
> RLIMIT_AS and nothing else?  How should the definitions of mem, pmem,
> pvmem, and vmem be clarified?  Ake seems to be our resident expert on
> these matters.

A short reply here.

Linux can only enforce RLIMIT_AS (and RLIMIT_STACK) with any degree of
certainty. RLIMIT_DATA IS enforced BUT DATA is only ever changed when
doing explicit sbrk/brk. Malloc's are (almost) always using mmap of
anonymous memory (AS).

There is also the added problem of Maui disagreeing with torque on how
to use these values, (p)mem, (p)vmem.

And, as someone alerted me to, shared libs may use LOTS of AS space
without every accessing it esp on 64-bit machines forcing users to
request very large vmem values just to get their rather small code to
run.

This really needs a good discussion since it also has to behave in a
sane maner on other OS:es.

Another small problem is that it is hard to get sane values out of Linux
on how much memory is actually available for user consumption at any one
point in time. File cacheing and other temporary kernel caches are not
reported as free (and shouldn't be).

As you can see from the above i don't have any solution either.
(On our clusters the current code is working ok with the user community
we have)

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



More information about the torqueusers mailing list