[torqueusers] process using more CPUs than requested
Gus Correa
gus at ldeo.columbia.edu
Fri Mar 4 13:35:43 MST 2011
Christopher Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 19/02/11 10:58, Gus Correa wrote:
>
>> I never tried to check how Torque enforces memory
>> use limits either.
>
> Well there are two parts, the limits that Torque can
> set so that the kernel will stop a program being able
> to request more memory than has been requested and then
> Torque being able to react and kill a process that has
> (somehow) got more RAM than requested.
>
>> Maybe this is because other than Matlab, we never
>> had problems with memory use anyway.
>
> Wait till you get to play with COMSOL which seems to
> think that linking in Eclipse (and hence Java) to its
> MPI tasks is a good idea... :-(
>
> At least with MATLAB you've got the -nojvm, -nodesktop
> and -nosplash options to disable HPC unfriendly behaviour.
>
>> If during execution the program increases its memory
>> footprint above the queue limit or above requested
>> limit, does it get killed?
>
> I believe you can tell Maui/Moab to kill jobs which exceed
> their requested limits. In my experience it's much better
> to get to set restrictions on how much memory the job can
> allocate first..
>
>> Which of the 'pmem', 'pvmem', 'vmem' resource limits are
>> effective in this regard?
>
> Can't comment on the reactive "kill when exceeded" ones,
> but to make malloc(3) et. al fail when a process requests
> too much memory then you have to use pvmem (not pmem) due
> to how malloc(3) works in current glibc versions (it calls
> mmap(2) not sbrk(2) for any non-trivial allocation and mmap(2)
> only honours RLIMIT_AS, not RLIMIT_DATA).
>
>> Does Torque control the memory use by child processes and/or
>> by threads the program may spawn during execution?
>
> With pvmem you effectively set a ulimit on the child processes,
> so an MPI job with 4 child processes and pvmem=1gb will use 4GB
> all up, but each task can only get 1GB of memory.
>
> This doesn't help SMP jobs, so there we have a dedicated "smp"
> queue (for things like MATLAB) which request all the cores on
> a node and don't have a default pvmem set (unlike our other
> queues).
>
> cheers!
> Chris
> - --
> Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.unimelb.edu.au/
>
>
Thank you for your very clear explanations, as usual.
I will experiment with the pvmem.
Indeed for Matlab I just ask users to request a full node,
no node sharing, so as not to harm other people's work.
I haven't tested Matlab with cpuset yet, since I recently installed
Torque w/ cpuset, but according to another Aussie,
Martin Thompson, there were some problems even with cpuset on.
The -nojvm, -nosplash, -nodisplay are all on my Matlab
straitjacket wrapper script that I tell users they must wear if
they want to run Matlab.
Maybe I could enforce it via qsub wrapper, but so far they've been
cooperative.
Cheers,
Gus Correa
More information about the torqueusers
mailing list