[torqueusers] How allow a job to use all memory on a node with cpuset enabled ?

François P-L francois.prudhomme at hotmail.fr
Wed Aug 28 01:22:43 MDT 2013


Many thanks for your answers :)
I'm running a 4.1.6.h2 version patched for a problem seen in https://github.com/adaptivecomputing/torque/issues/168.The solution to grant access to all mems in the prologue script is a good temporary fix. Do you make it with hwloc commands or directly in /dev/cpuset/ ?
I will wait for 4.2.6, and hope it will resolve this type of problem.
Totaly off topic : is there a way to avoid the use of "-n" option ? It can be a big problem if my users use it... some of them make very wrong estimation of required cpus... i put the cpuset in the cluster because of them...
Thanks again

Date: Tue, 27 Aug 2013 16:41:15 -0600
Subject: Re: [torqueusers] How allow a job to use all memory on a node with cpuset enabled ?
From: dbeer at adaptivecomputing.com
To: torqueusers at supercluster.org
CC: francois.prudhomme at hotmail.fr

The long term fix for this problem is to make the mom more intelligent about assigning these things. Moms will begin to hold a better picture of the node's internal layout and will assign memory more intelligently to the jobs. This fix is already most of the way done, but since we have already code frozen for 4.2.5, it will be in 4.2.6 instead. 

David

On Tue, Aug 27, 2013 at 2:24 PM, Derek Gottlieb <dgottlieb at exchange.asc.edu> wrote:

What version of torque are you running?  I know I reported this issue to them and they partially fixed it in 4.2.4, but there are still some pretty major shortcomings with their handling of memory in cpusets.  I've documented some sample scenarios that are problematic and they're supposed to be thinking how to address them.




As a short term fix, we rewrite the cpuset in the job prologue script to grant access to all mems in a node for every job when it starts.  I suspect the Linux kernel will do a better job of handling memory allocation intelligently than torque is allocating mems to cpusets.




Derek Gottlieb

HPC Systems Analyst, CSC

Alabama Supercomputer Center



686 Discovery Dr., Huntsville, AL 35806

High Performance Computing | dgottlieb at asc.edu | www.asc.edu



On Aug 27, 2013, at 8:22 AM, François P-L wrote:



> Hi,

>

> We are encountering some problems with jobs asking too many memory.

>

> For example, a job is asking 4 cpu and 126Gb.

> pbs_mom: LOG_INFO::create_job_cpuset, creating cpuset for job 235376[2]: 4 cpus (0-3), 1 mems (0)

>

> For my test i use "stress" with the following command :

> stress -c 2 -t 600 --vm 2 --vm-bytes 61G

>

> My node is with this topology :

> Machine (128GB)

>   NUMANode L#0 (P#0 64GB) + Socket L#0 + L3 L#0 (20MB)

>     L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)

>     L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)

>     L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)

>     L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)

>     L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)

>     L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)

>     L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)

>     L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)

>   NUMANode L#1 (P#1 64GB) + Socket L#1 + L3 L#1 (20MB)

>     L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)

>     L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)

>     L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)

>     L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)

>     L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)

>     L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)

>     L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)

>     L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)

>

> After a few seconds :

> kernel: [517453.738199] stress invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0

> (...)

> kernel: [517453.738204] stress cpuset=235376[2] mems_allowed=0

> (...)

>

> After reading qsub options, "-n" option can "solve" the problem... but it's a big waste of cpu in this case (all the node is dedicated for this job).

>

> Is there a way to allow a job to use all memory of a node without using all cpu ?

>

> Many thanks in advance.

>

> --

> This message has been scanned for viruses and

> dangerous content by MailScanner, and is

> believed to be clean. _______________________________________________

> torqueusers mailing list

> torqueusers at supercluster.org

> http://www.supercluster.org/mailman/listinfo/torqueusers



_______________________________________________

torqueusers mailing list

torqueusers at supercluster.org

http://www.supercluster.org/mailman/listinfo/torqueusers



-- 
David Beer | Senior Software EngineerAdaptive Computing
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130828/a8258623/attachment.html 


More information about the torqueusers mailing list