[torqueusers] How allow a job to use all memory on a node with cpuset enabled ?

Matt Britt msbritt at umich.edu
Wed Aug 28 08:16:37 MDT 2013


We do the same thing, cat-ing directly to /dev/cpuset via the prologue
scripts.

 - Matt




--------------------------------------------
Matthew Britt
CAEN HPC Group - College of Engineering
msbritt at umich.edu



On Wed, Aug 28, 2013 at 3:22 AM, François P-L <francois.prudhomme at hotmail.fr
> wrote:

> Many thanks for your answers :)
>
> I'm running a 4.1.6.h2 version patched for a problem seen in
> https://github.com/adaptivecomputing/torque/issues/168.
> The solution to grant access to all mems in the prologue script is a good
> temporary fix. Do you make it with hwloc commands or directly in
> /dev/cpuset/ ?
>
> I will wait for 4.2.6, and hope it will resolve this type of problem.
>
> Totaly off topic : is there a way to avoid the use of "-n" option ? It can
> be a big problem if my users use it... some of them make very wrong
> estimation of required cpus... i put the cpuset in the cluster because of
> them...
>
> Thanks again
>
> ------------------------------
> Date: Tue, 27 Aug 2013 16:41:15 -0600
>
> Subject: Re: [torqueusers] How allow a job to use all memory on a node
> with cpuset enabled ?
> From: dbeer at adaptivecomputing.com
> To: torqueusers at supercluster.org
> CC: francois.prudhomme at hotmail.fr
>
>
> The long term fix for this problem is to make the mom more intelligent
> about assigning these things. Moms will begin to hold a better picture of
> the node's internal layout and will assign memory more intelligently to the
> jobs. This fix is already most of the way done, but since we have already
> code frozen for 4.2.5, it will be in 4.2.6 instead.
>
> David
>
>
> On Tue, Aug 27, 2013 at 2:24 PM, Derek Gottlieb <
> dgottlieb at exchange.asc.edu> wrote:
>
> What version of torque are you running?  I know I reported this issue to
> them and they partially fixed it in 4.2.4, but there are still some pretty
> major shortcomings with their handling of memory in cpusets.  I've
> documented some sample scenarios that are problematic and they're supposed
> to be thinking how to address them.
>
> As a short term fix, we rewrite the cpuset in the job prologue script to
> grant access to all mems in a node for every job when it starts.  I suspect
> the Linux kernel will do a better job of handling memory allocation
> intelligently than torque is allocating mems to cpusets.
>
> Derek Gottlieb
> HPC Systems Analyst, CSC
> Alabama Supercomputer Center
>
> 686 Discovery Dr., Huntsville, AL 35806
> High Performance Computing | dgottlieb at asc.edu | www.asc.edu
>
> On Aug 27, 2013, at 8:22 AM, François P-L wrote:
>
> > Hi,
> >
> > We are encountering some problems with jobs asking too many memory.
> >
> > For example, a job is asking 4 cpu and 126Gb.
> > pbs_mom: LOG_INFO::create_job_cpuset, creating cpuset for job 235376[2]:
> 4 cpus (0-3), 1 mems (0)
> >
> > For my test i use "stress" with the following command :
> > stress -c 2 -t 600 --vm 2 --vm-bytes 61G
> >
> > My node is with this topology :
> > Machine (128GB)
> >   NUMANode L#0 (P#0 64GB) + Socket L#0 + L3 L#0 (20MB)
> >     L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
> (P#0)
> >     L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
> (P#1)
> >     L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
> (P#2)
> >     L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
> (P#3)
> >     L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
> (P#4)
> >     L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
> (P#5)
> >     L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
> (P#6)
> >     L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
> (P#7)
> >   NUMANode L#1 (P#1 64GB) + Socket L#1 + L3 L#1 (20MB)
> >     L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
> (P#8)
> >     L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
> (P#9)
> >     L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU
> L#10 (P#10)
> >     L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU
> L#11 (P#11)
> >     L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU
> L#12 (P#12)
> >     L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU
> L#13 (P#13)
> >     L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU
> L#14 (P#14)
> >     L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU
> L#15 (P#15)
> >
> > After a few seconds :
> > kernel: [517453.738199] stress invoked oom-killer: gfp_mask=0x280da,
> order=0, oom_adj=0, oom_score_adj=0
> > (...)
> > kernel: [517453.738204] stress cpuset=235376[2] mems_allowed=0
> > (...)
> >
> > After reading qsub options, "-n" option can "solve" the problem... but
> it's a big waste of cpu in this case (all the node is dedicated for this
> job).
> >
> > Is there a way to allow a job to use all memory of a node without using
> all cpu ?
> >
> > Many thanks in advance.
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean. _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> --
> David Beer | Senior Software Engineer
> Adaptive Computing
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130828/2f5907c7/attachment-0001.html 


More information about the torqueusers mailing list