[torqueusers] How to enforce pmem requirements
Gareth.Williams at csiro.au
Gareth.Williams at csiro.au
Thu Feb 19 23:16:41 MST 2009
Hi All,
I've just posted on mauiusers and maobusers on scheduling using vmem. This might be a good option for you.
The post is at:
http://www.clusterresources.com/pipermail/mauiusers/2009-February/thread.html
Gareth Williams
CSIRO IM&T - ASC
> -----Original Message-----
> From: David Singleton [mailto:David.Singleton at anu.edu.au]
> Sent: Thursday, 19 February 2009 7:45 AM
> To: Roger Moye
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] How to enforce pmem requirements
>
>
> Strictly speaking, pmem limits cant always stop nodes running out of
> memory even if enforced.
>
> a. A job can start an arbitrary number of processes none of which
> exceed the pmem limit.
>
> b. It is conceivable for apparently reasonable pmem limits to never
> be hit by a job that fills swap. Consider a 4 cpu node with
> 4GB of memory. A reasonable pmem limit would apparently be
> 1GB. However 4 processes growing memory use at the same rate
> will never reach that limit. They will start paging at some
> lower value and can continue paging until the node runs out
> of swap.
>
> My other problem with pmem (and mem) limits is that they are
> unpredictable.
> The same job running on the same node may run totally under the limit
> one run and hit the limit on another run. Process physical memory
> use depends not only on the job/process but also on the system state.
>
> Sorry for not being helpful.
>
> David
>
> Roger Moye wrote:
> >
> > We have Torque/Moab running on one cluster and Torque/Maui on another.
> > We encourage our users to use the pmem option to specify their memory
> > requirements in their PBS batch scripts. Is there a way to get the
> > scheduler to enforce these limits? That is, if a job attempts to exceed
> > the pmem value we want the scheduler to kill the job just like it would
> > if it exceeded its walltime. Currently we have a few users who have
> > their jobs exceed their pmem value and the result is trashed nodes
> > because the jobs have consumed too much memory.
> >
> > Thanks in advance for any help or advice!
> > -Roger
> >
More information about the torqueusers
mailing list