[Mauiusers] vmem confusion

Martin Thompson martin.thompson at unsw.edu.au
Sat Aug 14 22:46:51 MDT 2010


Hi there

I am a little confused about how Maui and Torque interpret vmem,
particularly in the case of multithreaded jobs.

I have a tiny cluster of virtual machines for experimenting with
scheduling policies.  The cluster uses Rocks 5.3 with the Torque roll
that provides Torque 2.4.6 and Maui 3.2.6p21.  The only twist is that I
rebuilt the Torque rpm so that I could add --enable-cpuset to the
configure script.

Each compute node has 2 cpus and 2gb memory, and I have been
experimenting with a multithreaded program (it uses Intel MKL or ATLAS)
that requires 1.5gb memory.  I submit the job using nodes=1:ppn=2 and
vmem=1600mb.  If I don't ask Maui to enforce any resource limits and
leave that to Torque and rlimit then everything is ok and the program
runs to completion.  I then considered using Maui to enforce the
resource limits...

ENFORCERESOURCELIMITS ON
RESOURCELIMITPOLICY   SWAP:ALWAYS:CANCEL

This was mainly because I was interested in providing some better
feedback to users when they exceed their memory requirements.  So with
Maui in charge of vmem resource limits, and ignvmem=true for Torque, I
submitted my job again.  However, this time the job was killed and the
following appeared in the Maui log...

job 56 exceeds requested swap limit (1548 > 800)
      job '56' in state 'Running' has exceeded SWAP resource limit (1548 > 800)
      (action CANCEL will be taken)

If I try with pvmem=1600mb then the job will never run because there is
no compute node with 2 x 1600mb of memory.

Interestingly, if I ask Maui to enforce limits on MEM rather than SWAP,
and I use mem instead of vmem, then everything appears to be ok.
However, I can see problems ahead if the jobs were constrained by mem
rather than vmem, so I don't particularly want to go in that direction.

Can anyone please identify my embarrassing mistake?

Many thanks

Martin


Torque config:

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts = XXX.XXX.XXX.XXX
set server managers = maui at XXX.XXX.XXX.XXX
set server managers += root at XXX.XXX.XXX.XXX
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.walltime = 01:00:00
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 80

Maui config:

RMPOLLINTERVAL        00:00:15

SERVERHOST            XXX.XXX.XXX.XXX
SERVERPORT            42559
SERVERMODE            NORMAL

RMCFG[base]           TYPE=PBS

ADMIN1                maui root
ADMIN3                ALL

LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

QUEUETIMEWEIGHT       1

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

NODEALLOCATIONPOLICY  MINRESOURCE

ENFORCERESOURCELIMITS ON
#RESOURCELIMITPOLICY   MEM:ALWAYS:CANCEL
RESOURCELIMITPOLICY   SWAP:ALWAYS:CANCEL




More information about the mauiusers mailing list