[torqueusers] How allow a job to use all memory on a node with cpuset enabled ?

Derek Gottlieb dgottlieb at exchange.asc.edu
Tue Aug 27 14:24:17 MDT 2013


What version of torque are you running?  I know I reported this issue to them and they partially fixed it in 4.2.4, but there are still some pretty major shortcomings with their handling of memory in cpusets.  I've documented some sample scenarios that are problematic and they're supposed to be thinking how to address them.

As a short term fix, we rewrite the cpuset in the job prologue script to grant access to all mems in a node for every job when it starts.  I suspect the Linux kernel will do a better job of handling memory allocation intelligently than torque is allocating mems to cpusets.

Derek Gottlieb
HPC Systems Analyst, CSC
Alabama Supercomputer Center

686 Discovery Dr., Huntsville, AL 35806
High Performance Computing | dgottlieb at asc.edu | www.asc.edu

On Aug 27, 2013, at 8:22 AM, François P-L wrote:

> Hi,
> 
> We are encountering some problems with jobs asking too many memory.
> 
> For example, a job is asking 4 cpu and 126Gb.
> pbs_mom: LOG_INFO::create_job_cpuset, creating cpuset for job 235376[2]: 4 cpus (0-3), 1 mems (0)
> 
> For my test i use "stress" with the following command :
> stress -c 2 -t 600 --vm 2 --vm-bytes 61G
> 
> My node is with this topology :
> Machine (128GB)
>   NUMANode L#0 (P#0 64GB) + Socket L#0 + L3 L#0 (20MB)
>     L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>     L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>     L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>     L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>     L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
>     L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
>     L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
>     L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
>   NUMANode L#1 (P#1 64GB) + Socket L#1 + L3 L#1 (20MB)
>     L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
>     L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
>     L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
>     L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
>     L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
>     L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
>     L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
>     L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
> 
> After a few seconds :
> kernel: [517453.738199] stress invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
> (...)
> kernel: [517453.738204] stress cpuset=235376[2] mems_allowed=0
> (...)
> 
> After reading qsub options, "-n" option can "solve" the problem... but it's a big waste of cpu in this case (all the node is dedicated for this job).
> 
> Is there a way to allow a job to use all memory of a node without using all cpu ?
> 
> Many thanks in advance.
> 
> -- 
> This message has been scanned for viruses and 
> dangerous content by MailScanner, and is 
> believed to be clean. _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list