[torqueusers] vmem and pvmem

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Fri Feb 24 03:03:12 MST 2012

> -----Original Message-----
> From: Martin Siegert [mailto:siegert at sfu.ca]
> Sent: Thursday, 16 February 2012 10:11 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] vmem and pvmem
> Hi,
> I am struggling with the implmentation of the vmem and pvmem
> resources (in principle the exact same concerns apply to mem
> and pmem). Let's say I set
> resources_default.pvmem = 1gb
> in qmgr. Now a user submits a job requesting -l procs=1,vmem=2gb
> and the job fails because it exceeds the pvmem default. Apparently
> torque treats vmem and pvmem as two independent resources, which
> (in particular for 1p jobs) is not very reasonable.
> Similarly, if I would set resources_default.vmem, jobs that
> request pvmem fail even if the specified
> (amount of pvmem)*(no. of requested processors) > vmem default
> How do people deal with this issue?
> As far as I can tell moab only uses pmem and pvmem, i.e., moab
> converts vmem to pvmem = vmem/procct and mem to vmem = mem/procct.
> Correct? Shouldn't torque do the same?
> I am worried about shared memory jobs though - jobs were pvmem
> is not really relevant since all processes share the same memory
> and vmem is not simply the sum of the process memory usage (at
> least you cannot add up amounts displayed by ps, etc.). But I do
> not know whether torque handles this correctly anyway, does it?
> For now I modified the torque_submitfilter posted by Gareth
> http://www.clusterresources.com/pipermail/torquedev/2011-
> March/003479.html
> (thanks Gareth!) to add a qsub option -l pvmem=... in those
> cases when the user requests vmem, but this appears to be an ugly
> workaround. Shouldn't there be a better way?
> Cheers,
> Martin

Hi Martin,

I hoped someone else would bite but no joy!

We expect/advise people to only request vmem and set a 
modest default vmem which forces them to explicitly specify 
vmem in most cases. The pbs_resources man page describes vmem 
as an aggregate limit across the whole job.

I wanted to be sure what actually happened with both vmem and 
pvmem requested (and no pvmem specified) so I ran a simple test, 
starting a multi-cpu job and looking at the imposed 'ulimit'.

Core_req       vmem  pvmem ulimit-v RPT
nodes=1:ppn=2  1gb   256mb 256mb    512mb
procs=2        1gb   256mb 256mb    1gb
nodes=1:ppn=2  1gb   4gb   1gb      4gb
procs=2        1gb   4gb   1gb      4gb
nodes=1:ppn=2  1gb   -     1gb      512mb
procs=2        1gb   -     1gb      1gb

So the ulimit value that influences whether a task can allocate 
memory, is set as the lower of the vmem and pvmem values. That 
makes some sense - at least more sense than taking the larger 
value.  What doesn't make sense is allowing pvmem to be higher 
than vmem in the first place - in that case torque should probably 
reject the job or 'fix' one of the settings but leaving it as is 
might not be so bad, except for moab's behaviour (keep reading).

I've noted in the past that setting the ulimit to vmem (the 
aggregate limit for the job) is highly conservative, but someone 
just might want to use most of the memory is a parallel job in one task.

I did not include in the test what torque does if you run multiple 
tasks and exceed the vmem limit in aggregate. I suspect torque will 
kill the job in that case when it noticed but it can get to that 
state as long as each task stays under the ulimit setting. Of course 
the job might fill memory first...

The last column is the Resources Per Task (my shorthand) that moab 
dedicates in its scheduling (you can see it in the output of 
checkjob). As you can see this seems wacky:
- with nodes/ppn it is the largest of vmem/procct and pvmem
- with procs it the larger of vmem and pvmem
In neither case do these limits agree with the ulimit set by torque.
Moab might also kill jobs if it thinks that limits are exceeded but 
seems unlikely to get a chance.

The different treatment of nodes/ppn and procs by moab is a gotcha 
and I'd consider it a bug. Moab seems to consider vmem to be a per 
task setting rather than an aggregate if you specify procs.

I'd be interested to know if maui or pbs_sched dedicates a different 
amount of vmem per task.

So all up, answering Martin's question further, apart from sticking 
to vmem only, I'd advise only also specify pvmem if you want to 
explicitly state that the memory should be evenly divided (pretty common!) 
and have the ulimit reflect that - and of course it should be set to 
vmem/procct. Is that what your new filter does Martin?

Just setting pvmem might be a good option though might not allow for 
the asymmetric memory case...


> --
> Martin Siegert
> Head, Research Computing
> Simon Fraser University
> Burnaby, British Columbia

More information about the torqueusers mailing list