[torqueusers] job exceed memory limit without been killed
chris at csamuel.org
Sat Mar 20 06:18:09 MDT 2010
On Fri, 19 Mar 2010 12:26:05 am Anton Starikov wrote:
> Which means that PBS_MOM already registered memory usage above limit and
> even updated this information on server, but didn't react and kill the
> What can be wrong? Do I miss something in the config?
I think you are misunderstanding what the mem/vmem/pmem/pvmem limits in Torque
actually do - they apply resource limits (ulimits in the shell, RLIMIT's in
terms of kernel APIs) to the processes that are launched by pbs_mom.
The problem is that in the old days malloc() in glibc just called brk() and in
the Linux kernel brk() obeys the RLIMIT_DATA limit which pbs_mom sets for mem
But then glibc changed and now calls mmap() for allocations over a certain
size and mmap() in the Linux kernel does not observe RLIMIT_DATA.
Perhaps the simplest fix is to translate any reference of mem or pmem to vmem
or pvmem as they will set the RLIMIT_AS limit which is observed by
RLIMIT_DATA, or use the Maui/Moab tricks which use the data reported by the
node to decide whether or not to kill the job.
For more information on the various RLIMIT's see the setrlimit() manual page.
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 481 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100320/62d3c9b2/attachment.bin
More information about the torqueusers