[torqueusers] cgroup memory allocation problem

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Sun Aug 12 19:55:15 MDT 2012


> -----Original Message-----
> From: Brock Palen [mailto:brockp at umich.edu]
> Sent: Friday, 10 August 2012 8:18 AM
> To: Torque Users Mailing List
> Subject: [torqueusers] cgroup memory allocation problem
> 
> I filed this with adaptive but others should be aware of a major
> problem for high memory use jobs on pbs_moms using cgroups:
> 
> cgroups in torque4 are assigning memory banks in numa systems based on
> core layout only.
> 
> Example:
> 
> 8 core 48GB memroy two socket machine valid cpus 0-7  valid mems 0-1
> 
> If a job is only on the first socket is is assigned to mems 0   if it
> is on the second, mems 1,  if a job is assigned cores on both it is
> assigned both.
> 
> The above is fine,
> 
> Now if I request 1 core and more memory, node has two 24GB memory banks
> qsub procs=1,mem=47gb
> 
> the mems is set to 0 and cpus 0 when my job hits 24 gb (the size of
> mems 0)  I start to swap rather than giving me all the assigned memory.
> 
> A similar case:
> procs=1,mem=20gb
> procs=1,mem=20gb
> procs=1,mem=20gb
> 
> On am empty node if they are all on the same one, they get assigned
> cpu 0, 1, and 2 but all get mems 0  and jobs swap.
> 
> Is there away to just assign all numa nodes in jobs?  and just use CPU
> binding?  Currently we are most interested in cpu binding.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> brockp at umich.edu
> (734)936-1985

Hi Brock,

For reference, we've noticed something related on our UV system. To work around this we set the numa virtual node configuration to define each virtual node to correspond to a socket and are asking/forcing users to request whole nodes except for low processor count low memory jobs.  The machine topology would be better reflected if we defined virtual nodes to correspond to socket pairs.

> Is there away to just assign all numa nodes in jobs?  and just use CPU
> binding?  Currently we are most interested in cpu binding.

You could use a submit filter to round requests up to full nodes or reject jobs...  Also you could use the prologue to alter the existing cpuset to include more mems.

Note we are running a torque 3 version with cpusets rather than cgroups per-se if that matters.

Gareth


More information about the torqueusers mailing list