[torqueusers] How allow a job to use all memory on a node with cpuset enabled ?

David Beer dbeer at adaptivecomputing.com
Tue Aug 27 16:41:15 MDT 2013


The long term fix for this problem is to make the mom more intelligent
about assigning these things. Moms will begin to hold a better picture of
the node's internal layout and will assign memory more intelligently to the
jobs. This fix is already most of the way done, but since we have already
code frozen for 4.2.5, it will be in 4.2.6 instead.

David


On Tue, Aug 27, 2013 at 2:24 PM, Derek Gottlieb
<dgottlieb at exchange.asc.edu>wrote:

> What version of torque are you running?  I know I reported this issue to
> them and they partially fixed it in 4.2.4, but there are still some pretty
> major shortcomings with their handling of memory in cpusets.  I've
> documented some sample scenarios that are problematic and they're supposed
> to be thinking how to address them.
>
> As a short term fix, we rewrite the cpuset in the job prologue script to
> grant access to all mems in a node for every job when it starts.  I suspect
> the Linux kernel will do a better job of handling memory allocation
> intelligently than torque is allocating mems to cpusets.
>
> Derek Gottlieb
> HPC Systems Analyst, CSC
> Alabama Supercomputer Center
>
> 686 Discovery Dr., Huntsville, AL 35806
> High Performance Computing | dgottlieb at asc.edu | www.asc.edu
>
> On Aug 27, 2013, at 8:22 AM, François P-L wrote:
>
> > Hi,
> >
> > We are encountering some problems with jobs asking too many memory.
> >
> > For example, a job is asking 4 cpu and 126Gb.
> > pbs_mom: LOG_INFO::create_job_cpuset, creating cpuset for job 235376[2]:
> 4 cpus (0-3), 1 mems (0)
> >
> > For my test i use "stress" with the following command :
> > stress -c 2 -t 600 --vm 2 --vm-bytes 61G
> >
> > My node is with this topology :
> > Machine (128GB)
> >   NUMANode L#0 (P#0 64GB) + Socket L#0 + L3 L#0 (20MB)
> >     L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
> (P#0)
> >     L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
> (P#1)
> >     L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
> (P#2)
> >     L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
> (P#3)
> >     L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
> (P#4)
> >     L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
> (P#5)
> >     L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
> (P#6)
> >     L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
> (P#7)
> >   NUMANode L#1 (P#1 64GB) + Socket L#1 + L3 L#1 (20MB)
> >     L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
> (P#8)
> >     L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
> (P#9)
> >     L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU
> L#10 (P#10)
> >     L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU
> L#11 (P#11)
> >     L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU
> L#12 (P#12)
> >     L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU
> L#13 (P#13)
> >     L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU
> L#14 (P#14)
> >     L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU
> L#15 (P#15)
> >
> > After a few seconds :
> > kernel: [517453.738199] stress invoked oom-killer: gfp_mask=0x280da,
> order=0, oom_adj=0, oom_score_adj=0
> > (...)
> > kernel: [517453.738204] stress cpuset=235376[2] mems_allowed=0
> > (...)
> >
> > After reading qsub options, "-n" option can "solve" the problem... but
> it's a big waste of cpu in this case (all the node is dedicated for this
> job).
> >
> > Is there a way to allow a job to use all memory of a node without using
> all cpu ?
> >
> > Many thanks in advance.
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean. _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130827/62cd664a/attachment-0001.html 


More information about the torqueusers mailing list