[torquedev] Linux kernel/glibc ulimit strangeness
csamuel at vpac.org
Fri Nov 30 02:49:06 MST 2007
I was helping a customer of ours upgrade from Torque from 1.2.x to
2.1.9 yesterday (as well as a Maui upgrade) so that they could
properly handle memory requests for their jobs with -l pmem=$foo .
One thing I ran into there was that although pbs_mom sets ulimits as
you would expect (data segment size, max memory size) for these jobs
we found that they are not enforced by current glibc / kernel
configurations (not that they were bothered about this).
After a bit of head scratching I tracked it down to the fact that
between somewhere around glibc 2.3 the malloc() implementation was
ripped out and replaced with one that uses mmap() for allocations of
128KB or more.
The kicker is that the kernel mmap() implementation only cares about
the virtual memory ulimit (RLIMIT_AS) for these, the others are
So currently an application which uses small allocations (<128KB) will
find malloc() failing when they hit their ulimit where an application
that does grabs RAM in larger chunks will sail happily past that
without a care in the world..
This raises two questions:
1) In the Linux pbs_mom should we be setting RLIMIT_AS in addition to
the the others so that these limits are enforced regardless of which
allocation strategy is followed by the application ?
2) Would it be possible to have a configuration option to disable
setting ulimits for those who want to use them as guidelines but not
enforced (for non-expert users) ?
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torquedev