[torqueusers] vmem and pvmem
Gareth.Williams at csiro.au
Gareth.Williams at csiro.au
Fri Feb 24 03:03:12 MST 2012
> -----Original Message-----
> From: Martin Siegert [mailto:siegert at sfu.ca]
> Sent: Thursday, 16 February 2012 10:11 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] vmem and pvmem
>
> Hi,
>
> I am struggling with the implmentation of the vmem and pvmem
> resources (in principle the exact same concerns apply to mem
> and pmem). Let's say I set
>
> resources_default.pvmem = 1gb
>
> in qmgr. Now a user submits a job requesting -l procs=1,vmem=2gb
> and the job fails because it exceeds the pvmem default. Apparently
> torque treats vmem and pvmem as two independent resources, which
> (in particular for 1p jobs) is not very reasonable.
> Similarly, if I would set resources_default.vmem, jobs that
> request pvmem fail even if the specified
> (amount of pvmem)*(no. of requested processors) > vmem default
>
> How do people deal with this issue?
> As far as I can tell moab only uses pmem and pvmem, i.e., moab
> converts vmem to pvmem = vmem/procct and mem to vmem = mem/procct.
> Correct? Shouldn't torque do the same?
> I am worried about shared memory jobs though - jobs were pvmem
> is not really relevant since all processes share the same memory
> and vmem is not simply the sum of the process memory usage (at
> least you cannot add up amounts displayed by ps, etc.). But I do
> not know whether torque handles this correctly anyway, does it?
>
> For now I modified the torque_submitfilter posted by Gareth
> http://www.clusterresources.com/pipermail/torquedev/2011-
> March/003479.html
> (thanks Gareth!) to add a qsub option -l pvmem=... in those
> cases when the user requests vmem, but this appears to be an ugly
> workaround. Shouldn't there be a better way?
>
> Cheers,
> Martin
Hi Martin,
I hoped someone else would bite but no joy!
We expect/advise people to only request vmem and set a
modest default vmem which forces them to explicitly specify
vmem in most cases. The pbs_resources man page describes vmem
as an aggregate limit across the whole job.
I wanted to be sure what actually happened with both vmem and
pvmem requested (and no pvmem specified) so I ran a simple test,
starting a multi-cpu job and looking at the imposed 'ulimit'.
Core_req vmem pvmem ulimit-v RPT
=========================================
nodes=1:ppn=2 1gb 256mb 256mb 512mb
procs=2 1gb 256mb 256mb 1gb
nodes=1:ppn=2 1gb 4gb 1gb 4gb
procs=2 1gb 4gb 1gb 4gb
nodes=1:ppn=2 1gb - 1gb 512mb
procs=2 1gb - 1gb 1gb
So the ulimit value that influences whether a task can allocate
memory, is set as the lower of the vmem and pvmem values. That
makes some sense - at least more sense than taking the larger
value. What doesn't make sense is allowing pvmem to be higher
than vmem in the first place - in that case torque should probably
reject the job or 'fix' one of the settings but leaving it as is
might not be so bad, except for moab's behaviour (keep reading).
I've noted in the past that setting the ulimit to vmem (the
aggregate limit for the job) is highly conservative, but someone
just might want to use most of the memory is a parallel job in one task.
I did not include in the test what torque does if you run multiple
tasks and exceed the vmem limit in aggregate. I suspect torque will
kill the job in that case when it noticed but it can get to that
state as long as each task stays under the ulimit setting. Of course
the job might fill memory first...
The last column is the Resources Per Task (my shorthand) that moab
dedicates in its scheduling (you can see it in the output of
checkjob). As you can see this seems wacky:
- with nodes/ppn it is the largest of vmem/procct and pvmem
- with procs it the larger of vmem and pvmem
In neither case do these limits agree with the ulimit set by torque.
Moab might also kill jobs if it thinks that limits are exceeded but
seems unlikely to get a chance.
The different treatment of nodes/ppn and procs by moab is a gotcha
and I'd consider it a bug. Moab seems to consider vmem to be a per
task setting rather than an aggregate if you specify procs.
I'd be interested to know if maui or pbs_sched dedicates a different
amount of vmem per task.
So all up, answering Martin's question further, apart from sticking
to vmem only, I'd advise only also specify pvmem if you want to
explicitly state that the memory should be evenly divided (pretty common!)
and have the ulimit reflect that - and of course it should be set to
vmem/procct. Is that what your new filter does Martin?
Just setting pvmem might be a good option though might not allow for
the asymmetric memory case...
Gareth
>
> --
> Martin Siegert
> Head, Research Computing
> Simon Fraser University
> Burnaby, British Columbia
More information about the torqueusers
mailing list