[Mauiusers] Problem with node allocation.

Justin Finnerty justin.finnerty at uni-oldenburg.de
Wed Nov 15 04:12:44 MST 2006


Hello

I am having a problem getting the torque/maui system to apply a policy in
the way jobs are allocated to nodes in our cluster.  I have browsed the
mailing list archive but have found no answers.  I am hoping for some
suggestions.

Firstly the cluster consists of three node types.

type1: has fast IO system, 2 cpus
type2: 2 cpus
type3: 4 cpus

I have a "serial-io" queue that limits jobs to type1 nodes.  I then have a
default queue for all other jobs.

Getting the "serial-io" queue to work was OK.  The problem arises with the
default queue and allocating jobs.  The allocation policy I would like is
as follows.

(A) If a job requests more than 2 cpus use type3 nodes [This always works]
(B) If a job requests only one node (max of 2 cpus) then it can be
allocated to, in order of preference type2, type3 and type1.
(C) If a job requests multiple nodes (max of 2 cpus) then it can be
allocated to, in order of preference type2 then type3.

I think I can assume that policy A will always work or maui is seriously
broken.

However policy B and C are very difficult to get working.

What I have so far:

The torque default queue splits jobs into execution queues "single" for
single node jobs (policy B) and "normal" for multi-node jobs (policy C). 
This works reliably.

The problem is that maui tends to allocate jobs to the type3 nodes in
preference to the type2 nodes regardless of what I do.  We only have two
type3 nodes so I want 2cpu/node jobs to use these only as a last resort. 
But we have many more 2cpu/node jobs than 4cpu/node jobs so I don't want
to exclude using type3 nodes altogether.

Current Maui cfg. (Summary)

Partitions:
type1 nodes are in partition "serial"
type2/type3 nodes are in partition "normal"

Standing Reservations:

SRCFG[type1]      HOSTLIST=n0[1-5] CLASSLIST=serial-io,single-
                  NODEFEATURES=type1 PERIOD=INFINITY

SRCFG[type3]      HOSTLIST=n33,n34 CLASSLIST=normal-,single-
                  NODEFEATURES=type3 PERIOD=INFINITY

Class config:

CLASSCFG[serial-io]     PDEF=serial DEFAULT.FEATURES=type1 PLIST=serial
CLASSCFG[single]        PDEF=normal PLIST=normal,serial
CLASSCFG[normal]        PDEF=normal PLIST=normal

I have confirmed that the configuration above is applied properly.  We
also have fair sharing enabled and that appears to work OK too.

Node allocation policy (currently):

NODEALLOCATIONPOLICY    PRIORITY

NODECFG[DEFAULT] PARTITION=normal PRIORITY=1000 PRIORITYF='PRIORITY + PREF
+  RESAFFINITY'
# For each type 1 node
NODECFG[XX] PARTITION=serial PRIORITY=10
# For each type 2 node
NODECFG[XX] PARTITION=normal PRIORITY=1000
# For each type 3 node
NODECFG[XX] PARTITION=normal PRIORITY=100

Now I have played around with the different NODEALLOCATIONPOLICY settings,
especially PRIORITY and PRIORITYF, but nothing seems to change the
preference for type3 node allocation over type2 (or type1).

Additionally I am not entirely sure I need the partitions; I would like
the configuration to be as simple as possible.

Any comments or suggestions would be appreciated.

Cheers
Justin


-- 
Dr Justin Finnerty
Rm W3-1-218         Ph 49 (441) 798 3726
Carl von Ossietzky Universität Oldenburg




More information about the mauiusers mailing list