[Mauiusers] Question on maui.cfg NODEALLOCATION policy

Brad Mecklenburg bmecklenburg at colsa.com
Thu Feb 8 07:48:10 MST 2007


I have one cluster, 256 nodes but the hardware is split up between the
nodes, 128 nodes each. Part A and Part B have different number of
processors, processor speed, memory...  It is set up to specify which half
of the cluster to run on in the pbs submit script based on the attributes
set in server_priv/nodes file.

I'm having a little trouble with the NODEALLOCATION policy, I think.  When I
set it to MINRESOURCE, Part A of the cluster can be run on but this half has
4 cpus, so when the user tries to submit say 32 node jobs, the same nodes
are being run on and all 4 cpus are used which is not what I want. Set up
this way, part B of the cluster is able to have jobs run on it immediately
and complete fine.  So I changed the NODEALLOCATION POLICY to CPULOAD.  Part
of A of the cluster behaves how I want it. If the user submits two 32 node
jobs with 2 processors, then the jobs are split up but I have a problem
running on part B of the cluster. when I try to submit a job to this side of
the cluster, the job stays in the QUEUED state from qstat.  The maul log
will say something like 256 feasible tasks found (running a 128 node ppn=2
job) but the next line will say something like "inadequate tasks found for
job whatever 2< 256"  However, if I qrun the job number, the job will run
fine and complete successfully.

I guess I have a couple of questions. The problem I am seeing with the job
staying in the queue. Why does it stay in the queue because the
NODEALLOCATION policy is set to CPULOAD.   There is only one job running on
this half of the cluster.  Should I set the NODEALLOCATION policy to another
setting for this to work properly?

I will put below my maui.cfg so if anyone sees anything that would be
causing this type of behavior, please let me know.  I have not made many
changes to the maui.cfg.  It is pretty much just a basic setup. Any info
would be appreciated. Thanks.

# maui.cfg 3.2.6p18

SERVERHOST            marvin
# primary admin must be first in list
ADMIN1                root brad jbennett cpd

# Resource Manager Definition

RMCFG[marvin] TYPE=PBS
RMCFG[marvin] TIMEOUT=30
JOBAGGREGATIONTIME    00:00:10
# Allocation Manager Definition

AMCFG[bank]  TYPE=NONE

# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration

RMPOLLINTERVAL        00:01:00

SERVERPORT            42559
SERVERMODE            NORMAL

# Admin: http://supercluster.org/mauidocs/a.esecurity.html


LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       1

# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html

#FSPOLICY              PSDEDICATED
#FSDEPTH               7
#FSINTERVAL            86400
#FSDECAY               0.80

# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html

# NONE SPECIFIED

# Backfill: http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html

NODEALLOCATIONPOLICY  CPULOAD

# QOS: http://supercluster.org/mauidocs/7.3qos.html

# QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE

# Standing Reservations:
http://supercluster.org/mauidocs/7.1.3standingreservations.html

# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test]   17:00:00
# SRDAYS[test]      MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test]   0:30:00

# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html

# USERCFG[DEFAULT]      FSTARGET=25.0
# USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
# GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch]       FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR

## Additions made to the maui config file upon build. Brad
NODEAVAILABILITYPOLICY      DEDICATED:SWAP
JOBNODEMATCHPOLICY          EXACTNODE
NODEACCESSPOLICY            SHARED
NODEMAXLOAD                 3.5

DEFERTIME                   0
0
LOGDIR                      /var/spool/maui/log

LOGFILEROLLDEPTH            10
STATDIR                     /var/spool/maui/stats
###test to help with running on otis nodes:
ENABLEMULTIREQJOBS         TRUE
-- 
Brad Mecklenburg






More information about the mauiusers mailing list