[torqueusers] How to enforce pmem requirements

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Thu Feb 19 23:16:41 MST 2009


Hi All,

I've just posted on mauiusers and maobusers on scheduling using vmem.  This might be a good option for you.
The post is at:
http://www.clusterresources.com/pipermail/mauiusers/2009-February/thread.html

Gareth Williams
CSIRO IM&T - ASC


> -----Original Message-----
> From: David Singleton [mailto:David.Singleton at anu.edu.au]
> Sent: Thursday, 19 February 2009 7:45 AM
> To: Roger Moye
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] How to enforce pmem requirements
> 
> 
> Strictly speaking, pmem limits cant always stop nodes running out of
> memory even if enforced.
> 
>   a. A job can start an arbitrary number of processes none of which
>      exceed the pmem limit.
> 
>   b. It is conceivable for apparently reasonable pmem limits to never
>      be hit by a job that fills swap.  Consider a 4 cpu node with
>      4GB of memory.  A reasonable pmem limit would apparently be
>      1GB.  However 4 processes growing memory use at the same rate
>      will never reach that limit.  They will start paging at some
>      lower value and can continue paging until the node runs out
>      of swap.
> 
> My other problem with pmem (and mem) limits is that they are
> unpredictable.
> The same job running on the same node may run totally under the limit
> one run and hit the limit on another run.  Process physical memory
> use depends not only on the job/process but also on the system state.
> 
> Sorry for not being helpful.
> 
> David
> 
> Roger Moye wrote:
> >
> > We have Torque/Moab running on one cluster and Torque/Maui on another.
> > We encourage our users to use the pmem option to specify their memory
> > requirements in their PBS batch scripts.  Is there a way to get the
> > scheduler to enforce these limits?  That is, if a job attempts to exceed
> > the pmem value we want the scheduler to kill the job just like it would
> > if it exceeded its walltime.  Currently we have a few users who have
> > their jobs exceed their pmem value and the result is trashed nodes
> > because the jobs have consumed too much memory.
> >
> > Thanks in advance for any help or advice!
> > -Roger
> >



More information about the torqueusers mailing list