[Mauiusers] Problem with node allocation.
justin.finnerty at uni-oldenburg.de
Wed Nov 15 04:12:44 MST 2006
I am having a problem getting the torque/maui system to apply a policy in
the way jobs are allocated to nodes in our cluster. I have browsed the
mailing list archive but have found no answers. I am hoping for some
Firstly the cluster consists of three node types.
type1: has fast IO system, 2 cpus
type2: 2 cpus
type3: 4 cpus
I have a "serial-io" queue that limits jobs to type1 nodes. I then have a
default queue for all other jobs.
Getting the "serial-io" queue to work was OK. The problem arises with the
default queue and allocating jobs. The allocation policy I would like is
(A) If a job requests more than 2 cpus use type3 nodes [This always works]
(B) If a job requests only one node (max of 2 cpus) then it can be
allocated to, in order of preference type2, type3 and type1.
(C) If a job requests multiple nodes (max of 2 cpus) then it can be
allocated to, in order of preference type2 then type3.
I think I can assume that policy A will always work or maui is seriously
However policy B and C are very difficult to get working.
What I have so far:
The torque default queue splits jobs into execution queues "single" for
single node jobs (policy B) and "normal" for multi-node jobs (policy C).
This works reliably.
The problem is that maui tends to allocate jobs to the type3 nodes in
preference to the type2 nodes regardless of what I do. We only have two
type3 nodes so I want 2cpu/node jobs to use these only as a last resort.
But we have many more 2cpu/node jobs than 4cpu/node jobs so I don't want
to exclude using type3 nodes altogether.
Current Maui cfg. (Summary)
type1 nodes are in partition "serial"
type2/type3 nodes are in partition "normal"
SRCFG[type1] HOSTLIST=n0[1-5] CLASSLIST=serial-io,single-
SRCFG[type3] HOSTLIST=n33,n34 CLASSLIST=normal-,single-
CLASSCFG[serial-io] PDEF=serial DEFAULT.FEATURES=type1 PLIST=serial
CLASSCFG[single] PDEF=normal PLIST=normal,serial
CLASSCFG[normal] PDEF=normal PLIST=normal
I have confirmed that the configuration above is applied properly. We
also have fair sharing enabled and that appears to work OK too.
Node allocation policy (currently):
NODECFG[DEFAULT] PARTITION=normal PRIORITY=1000 PRIORITYF='PRIORITY + PREF
# For each type 1 node
NODECFG[XX] PARTITION=serial PRIORITY=10
# For each type 2 node
NODECFG[XX] PARTITION=normal PRIORITY=1000
# For each type 3 node
NODECFG[XX] PARTITION=normal PRIORITY=100
Now I have played around with the different NODEALLOCATIONPOLICY settings,
especially PRIORITY and PRIORITYF, but nothing seems to change the
preference for type3 node allocation over type2 (or type1).
Additionally I am not entirely sure I need the partitions; I would like
the configuration to be as simple as possible.
Any comments or suggestions would be appreciated.
Dr Justin Finnerty
Rm W3-1-218 Ph 49 (441) 798 3726
Carl von Ossietzky Universität Oldenburg
More information about the mauiusers