[torqueusers] process using more CPUs than requested

Christopher Samuel samuel at unimelb.edu.au
Wed Mar 2 17:26:55 MST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 19/02/11 10:58, Gus Correa wrote:

> I never tried to check how Torque enforces memory
> use limits either.

Well there are two parts, the limits that Torque can
set so that the kernel will stop a program being able
to request more memory than has been requested and then
Torque being able to react and kill a process that has
(somehow) got more RAM than requested.

> Maybe this is because other than Matlab, we never 
> had problems with memory use anyway.

Wait till you get to play with COMSOL which seems to
think that linking in Eclipse (and hence Java) to its
MPI tasks is a good idea... :-(

At least with MATLAB you've got the -nojvm, -nodesktop
and -nosplash options to disable HPC unfriendly behaviour.

> If during execution the program increases its memory
> footprint above the queue limit or above requested
> limit, does it get killed?

I believe you can tell Maui/Moab to kill jobs which exceed
their requested limits.  In my experience it's much better
to get to set restrictions on how much memory the job can
allocate first..

> Which of the 'pmem', 'pvmem', 'vmem' resource limits are
> effective in this regard?

Can't comment on the reactive "kill when exceeded" ones,
but to make malloc(3) et. al fail when a process requests
too much memory then you have to use pvmem (not pmem) due
to how malloc(3) works in current glibc versions (it calls
mmap(2) not sbrk(2) for any non-trivial allocation and mmap(2)
only honours RLIMIT_AS, not RLIMIT_DATA).

> Does Torque control the memory use by child processes and/or
> by threads the program may spawn during execution?

With pvmem you effectively set a ulimit on the child processes,
so an MPI job with 4 child processes and pvmem=1gb will use 4GB
all up, but each task can only get 1GB of memory.

This doesn't help SMP jobs, so there we have a dedicated "smp"
queue (for things like MATLAB) which request all the cores on
a node and don't have a default pvmem set (unlike our other
queues).

cheers!
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1u4E8ACgkQO2KABBYQAh9JYgCfd1vuQz0wUY8g8Q57mv4hR8iZ
qXQAni0JQbczfDK3Un+MOYJegmQ1vOdu
=hISK
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list