[torqueusers] process using more CPUs than requested

Gus Correa gus at ldeo.columbia.edu
Fri Mar 4 13:35:43 MST 2011


Christopher Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 19/02/11 10:58, Gus Correa wrote:
> 
>> I never tried to check how Torque enforces memory
>> use limits either.
> 
> Well there are two parts, the limits that Torque can
> set so that the kernel will stop a program being able
> to request more memory than has been requested and then
> Torque being able to react and kill a process that has
> (somehow) got more RAM than requested.
> 
>> Maybe this is because other than Matlab, we never 
>> had problems with memory use anyway.
> 
> Wait till you get to play with COMSOL which seems to
> think that linking in Eclipse (and hence Java) to its
> MPI tasks is a good idea... :-(
> 
> At least with MATLAB you've got the -nojvm, -nodesktop
> and -nosplash options to disable HPC unfriendly behaviour.
> 
>> If during execution the program increases its memory
>> footprint above the queue limit or above requested
>> limit, does it get killed?
> 
> I believe you can tell Maui/Moab to kill jobs which exceed
> their requested limits.  In my experience it's much better
> to get to set restrictions on how much memory the job can
> allocate first..
> 
>> Which of the 'pmem', 'pvmem', 'vmem' resource limits are
>> effective in this regard?
> 
> Can't comment on the reactive "kill when exceeded" ones,
> but to make malloc(3) et. al fail when a process requests
> too much memory then you have to use pvmem (not pmem) due
> to how malloc(3) works in current glibc versions (it calls
> mmap(2) not sbrk(2) for any non-trivial allocation and mmap(2)
> only honours RLIMIT_AS, not RLIMIT_DATA).
> 
>> Does Torque control the memory use by child processes and/or
>> by threads the program may spawn during execution?
> 
> With pvmem you effectively set a ulimit on the child processes,
> so an MPI job with 4 child processes and pvmem=1gb will use 4GB
> all up, but each task can only get 1GB of memory.
> 
> This doesn't help SMP jobs, so there we have a dedicated "smp"
> queue (for things like MATLAB) which request all the cores on
> a node and don't have a default pvmem set (unlike our other
> queues).
> 
> cheers!
> Chris
> - -- 
>     Christopher Samuel - Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>          http://www.vlsci.unimelb.edu.au/
> 
>

Thank you for your very clear explanations, as usual.

I will experiment with the pvmem.

Indeed for Matlab I just ask users to request a full node,
no node sharing, so as not to harm other people's work.
I haven't tested Matlab with cpuset yet, since I recently installed 
Torque w/ cpuset, but according to another Aussie,
Martin Thompson, there were some problems even with cpuset on.
The -nojvm, -nosplash, -nodisplay are all on my Matlab
straitjacket wrapper script that I tell users they must wear if
they want to run Matlab.
Maybe I could enforce it via qsub wrapper, but so far they've been 
cooperative.

Cheers,
Gus Correa


More information about the torqueusers mailing list