[torqueusers] process using more CPUs than requested
samuel at unimelb.edu.au
Wed Mar 2 17:26:55 MST 2011
-----BEGIN PGP SIGNED MESSAGE-----
On 19/02/11 10:58, Gus Correa wrote:
> I never tried to check how Torque enforces memory
> use limits either.
Well there are two parts, the limits that Torque can
set so that the kernel will stop a program being able
to request more memory than has been requested and then
Torque being able to react and kill a process that has
(somehow) got more RAM than requested.
> Maybe this is because other than Matlab, we never
> had problems with memory use anyway.
Wait till you get to play with COMSOL which seems to
think that linking in Eclipse (and hence Java) to its
MPI tasks is a good idea... :-(
At least with MATLAB you've got the -nojvm, -nodesktop
and -nosplash options to disable HPC unfriendly behaviour.
> If during execution the program increases its memory
> footprint above the queue limit or above requested
> limit, does it get killed?
I believe you can tell Maui/Moab to kill jobs which exceed
their requested limits. In my experience it's much better
to get to set restrictions on how much memory the job can
> Which of the 'pmem', 'pvmem', 'vmem' resource limits are
> effective in this regard?
Can't comment on the reactive "kill when exceeded" ones,
but to make malloc(3) et. al fail when a process requests
too much memory then you have to use pvmem (not pmem) due
to how malloc(3) works in current glibc versions (it calls
mmap(2) not sbrk(2) for any non-trivial allocation and mmap(2)
only honours RLIMIT_AS, not RLIMIT_DATA).
> Does Torque control the memory use by child processes and/or
> by threads the program may spawn during execution?
With pvmem you effectively set a ulimit on the child processes,
so an MPI job with 4 child processes and pvmem=1gb will use 4GB
all up, but each task can only get 1GB of memory.
This doesn't help SMP jobs, so there we have a dedicated "smp"
queue (for things like MATLAB) which request all the cores on
a node and don't have a default pvmem set (unlike our other
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----
More information about the torqueusers