[torquedev] Linux kernel/glibc ulimit strangeness

Chris Samuel csamuel at vpac.org
Fri Nov 30 02:49:06 MST 2007


Hi all,

I was helping a customer of ours upgrade from Torque from 1.2.x to 
2.1.9 yesterday (as well as a Maui upgrade) so that they could 
properly handle memory requests for their jobs with -l pmem=$foo .

One thing I ran into there was that although pbs_mom sets ulimits as 
you would expect (data segment size, max memory size) for these jobs 
we found that they are not enforced by current glibc / kernel 
configurations (not that they were bothered about this).

After a bit of head scratching I tracked it down to the fact that 
between somewhere around glibc 2.3 the malloc() implementation was 
ripped out and replaced with one that uses mmap() for allocations of 
128KB or more.

The kicker is that the kernel mmap() implementation only cares about 
the virtual memory ulimit (RLIMIT_AS) for these, the others are 
ignored.

So currently an application which uses small allocations (<128KB) will 
find malloc() failing when they hit their ulimit where an application 
that does grabs RAM in larger chunks will sail happily past that 
without a care in the world..

This raises two questions:

1) In the Linux pbs_mom should we be setting RLIMIT_AS in addition to 
the the others so that these limits are enforced regardless of which 
allocation strategy is followed by the application ?

2)  Would it be possible to have a configuration option to disable 
setting ulimits for those who want to use them as guidelines but not 
enforced (for non-expert users) ?

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torquedev mailing list