[torqueusers] How to enforce pmem requirements

David Singleton David.Singleton at anu.edu.au
Fri Feb 20 00:35:12 MST 2009


Hi Gareth,

My reading of your post:

  "With the patch, maui/moab correctly detects a node's available vmem
   as being the total physical memory plus swap space. This can then be
   scheduled/allocated by requesting vmem and job's virtual memory
   allocation can be limited on a per process basis and periodically
   measured (and action taken or overuse) on a per job basis."

is that jobs are able to thrash away in swap under these limits.
Probably not what you want.

I guess I would say vmem has nothing much to do with swap.  Huh?!
vmem (as the term is used in PBS) is to do with the virtual address
ranges of processes (VSZ) and, for better or worse, a job vmem is the
sum of these guys.  swap is actually used by physical pages - its
where physical pages that dont fit in memory go.

The useful thing about process VSZ is that it is an upper bound on
a process's physical memory use (resident + swapped pages).  We use a
conservative vmem allocation scheme where the sum of vmems of
running jobs(*) has to fit in the physical memory of the nodes of
those jobs.   AFAICT, currently, that's the only way of guaranteeing
you wont run out of memory or suffer swap thrashing:

sum job physical memory <= sum job virtual memory <= node physical memory

If Linux provided a measure of process physical memory use (RSS+swap),
then that is what should be used for limiting directly, i.e. PBS mem
should be defined as (RSS+swap) instead of just RSS and use the
constraint:

       sum job physical memory (mem) <= node physical memory

Hopefully that is what we will get with cgroup memory controllers.
Or maybe that is what your patch is using now?

Cheers,
David

(*) we do a fair bit of doctoring of vmem evaluation for shared maps etc.


Gareth.Williams at csiro.au wrote:
> Hi All,
> 
> I've just posted on mauiusers and maobusers on scheduling using vmem.  This might be a good option for you.
> The post is at:
> http://www.clusterresources.com/pipermail/mauiusers/2009-February/thread.html
> 
> Gareth Williams
> CSIRO IM&T - ASC
> 
> 
>> -----Original Message-----
>> From: David Singleton [mailto:David.Singleton at anu.edu.au]
>> Sent: Thursday, 19 February 2009 7:45 AM
>> To: Roger Moye
>> Cc: torqueusers at supercluster.org
>> Subject: Re: [torqueusers] How to enforce pmem requirements
>>
>>
>> Strictly speaking, pmem limits cant always stop nodes running out of
>> memory even if enforced.
>>
>>   a. A job can start an arbitrary number of processes none of which
>>      exceed the pmem limit.
>>
>>   b. It is conceivable for apparently reasonable pmem limits to never
>>      be hit by a job that fills swap.  Consider a 4 cpu node with
>>      4GB of memory.  A reasonable pmem limit would apparently be
>>      1GB.  However 4 processes growing memory use at the same rate
>>      will never reach that limit.  They will start paging at some
>>      lower value and can continue paging until the node runs out
>>      of swap.
>>
>> My other problem with pmem (and mem) limits is that they are
>> unpredictable.
>> The same job running on the same node may run totally under the limit
>> one run and hit the limit on another run.  Process physical memory
>> use depends not only on the job/process but also on the system state.
>>
>> Sorry for not being helpful.
>>
>> David
>>
>> Roger Moye wrote:
>>> We have Torque/Moab running on one cluster and Torque/Maui on another.
>>> We encourage our users to use the pmem option to specify their memory
>>> requirements in their PBS batch scripts.  Is there a way to get the
>>> scheduler to enforce these limits?  That is, if a job attempts to exceed
>>> the pmem value we want the scheduler to kill the job just like it would
>>> if it exceeded its walltime.  Currently we have a few users who have
>>> their jobs exceed their pmem value and the result is trashed nodes
>>> because the jobs have consumed too much memory.
>>>
>>> Thanks in advance for any help or advice!
>>> -Roger
>>>
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


-- 
--------------------------------------------------------------------------
    Dr David Singleton               ANU Supercomputer Facility
    HPC Systems Manager              and NCI National Facility
    David.Singleton at anu.edu.au       Leonard Huxley Bldg (No. 56)
    Phone: +61 2 6125 4389           Australian National University
    Fax:   +61 2 6125 8199           Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------


More information about the torqueusers mailing list