[torqueusers] Confused about vmem
Diego M. Vadell
dvadell at linuxclusters.com.ar
Mon Sep 29 19:54:17 MDT 2008
Hi list,
We are starting to restrict the amount of RAM with torque. We have 4 cores
nodes with 4Gb of RAM.
I made some tests with setting, via qmgr, a default of pvmem = 1Gb and it
seemed to be what we were looking for, until we considered a particular job.
This job uses openMP. It is one process that needs a whole node: the 4Gb
and the 4 cores.
If I ask for nodes=1:ppn=4,pvmem=1G, torque will get me a whole node, but
it will kill this job's only process when it uses more than 1G. If I ask for
nodes=1:ppn=4,pvmem=4G, torque will look for a node with 16Gb of RAM, and I
don't have one of those.
So I found vmem. As I can't make lots of test right now (I have to schedule
downtime to do that safely) I made some little ones (with a program called
memtest that just malloc's and fills memory):
1) Got an interactive session with vmem=1Gb and ran 1 memtest:
qsub -I -l nodes=1:ppn=2 -l vmem=1000mb
Commmand line: examples/memtest
Result: malloc failure after 993 MiB
Expected: This is Ok as vmem is the total amount of virtual memory the whole
job can use.
2) Got an interactive session with vmem=1Gb and ran 2 memtest, in parallel:
qsub -I -l nodes=1:ppn=2 -l vmem=1000mb
Command line: examples/memtest & examples/memtest
Result: malloc failure after 993 MiB
malloc failure after 993 MiB
Expected: So both can allocate almost 1Gb! I was expecting them to die at
500Mb, so they would sum the max 1Gb I asked torque.
3) Got an interactive session with pvmem=1Gb and ran 2 memtest, in parallel:
qsub -I -l nodes=1:ppn=2 -l pvmem=1000mb
Command line: examples/memtest & examples/memtest
Result: malloc failure after 993 MiB
malloc failure after 993 MiB
Expected: This is what I was expecting because both can allocate about 1Gb.
So vmem and pvmem seem to do the same. What am I missing here?
Thanks in advance,
-- Diego.
More information about the torqueusers
mailing list