[torqueusers] cgroup memory allocation problem

Brock Palen brockp at umich.edu
Thu Aug 9 16:18:12 MDT 2012

I filed this with adaptive but others should be aware of a major problem for high memory use jobs on pbs_moms using cgroups:

cgroups in torque4 are assigning memory banks in numa systems based on core layout only.


8 core 48GB memroy two socket machine valid cpus 0-7  valid mems 0-1

If a job is only on the first socket is is assigned to mems 0   if it is on the second, mems 1,  if a job is assigned cores on both it is assigned both.  

The above is fine, 

Now if I request 1 core and more memory, node has two 24GB memory banks
qsub procs=1,mem=47gb

the mems is set to 0 and cpus 0 when my job hits 24 gb (the size of mems 0)  I start to swap rather than giving me all the assigned memory.

A similar case:

On am empty node if they are all on the same one, they get assigned  cpu 0, 1, and 2 but all get mems 0  and jobs swap.

Is there away to just assign all numa nodes in jobs?  and just use CPU binding?  Currently we are most interested in cpu binding.

Brock Palen
CAEN Advanced Computing
brockp at umich.edu

More information about the torqueusers mailing list