Maui functions by manipulating five primary, elementary
objects. These are jobs, nodes, reservations, QOS structures, and
policies. In addition to these, multiple minor elementary objects
and composite objects are also utilized. These objects are also defined
in the scheduling dictionary.
Job information is provided to the Maui scheduler from a resource
manager such as Loadleveler, PBS, Wiki, or LSF. Job attributes include
ownership of the job, job state, amount and type of resources required
by the job, and a wallclock limit, indicating how long the resources are
required. A job consists of one or more requirements
each of which requests a number of resources of a given type. For
example, a job may consist of two requirements, the first asking for '1
IBM SP node with at least 512 MB of RAM' and the second asking for '24
IBM SP nodes with at least 128 MB of RAM'. Each requirements consists
of one or more tasks where a task is defined
as the minimal independent unit of resources. By default, each task
is equivalent to one processor. In SMP environments, however, users
may wish to tie one or more processors together with a certain amount of
memory and/or other resources.
22.214.171.124.1 Requirement (or Req)
A job requirement (or req) consists of a request
for a single type of resources. Each requirement consists of the
- Task Definition
A specification of the elementary resources which
compose an individual task.
- Resource Constraints
A specification of conditions which must be met in
order for resource matching to occur. Only resources from
nodes which meet all resource constraints may be allocated to the
- Task Count
The number of task instances required by the req.
- Task List
The list of nodes on which the task instances have
- Req Statistics
Statistics tracking resource utilization
As far as Maui is concerned, a node is a collection
of resources with a particular set of associated attributes. In most
cases, it fits nicely with the canonical world view of a node such as a
PC cluster node or an SP node. In these cases, a node is defined
as one or more CPU's, memory, and possibly other compute resources such
as local disk, swap, network adapters, software licenses, etc. Additionally,
this node will described by various attributes such as an architecture
type or operating system. Nodes range in size from small uniprocessor
PC's to large SMP systems where a single node may consist of hundreds of
CPU's and massive amounts of memory.
Information about nodes is provided to the
scheduler chiefly by the resource manager. Attributes include node
state, configured and available resources (i.e., processors, memory, swap,
etc.), run classes supported, etc.
126.96.36.199 Advance Reservations
An advance reservation is an object which dedicates a block of
specific resources for a particular use. Each reservation consists of a
list of resources, an access control list, and a time range for which this
access control list will be enforced. The reservation prevents the listed
resources from being used in a way not described by the access control
list during the time range specified. For example, a reservation could
reserve 20 processors and 10 GB of memory for users Bob and John from Friday
6:00 AM to Saturday 10:00 PM'. Maui uses advance reservations extensively
to manage backfill, guarantee resource availability for active jobs, allow
service guarantees, support deadlines, and enable metascheduling. Maui
also supports both regularly recurring reservations and the creation of
dynamic one time reservations for special needs. Advance reservations are
described in detail in the advance reservation overview.
Policies are generally specified via a config file and serve
to control how and when jobs start. Policies include job prioritization,
fairness policies, fairshare configuration policies, and scheduling policies.
Jobs, nodes, and reservations all deal with the abstract
concept of a resource. A resource in the Maui world is one of the
Processors are specified with a simple
Real memory or 'RAM' is specified in
Virtual memory or 'swap' is specified
in megabytes (MB).
Local disk is specified in megabytes
In addition to these elementary resource types, there
are two higher level resource concepts used within Maui. These are
the task and the processor equivalent, or PE.
A task is a collection of elementary resources
which must be allocated together within a single node.
For example, a task may consist of one processor, 512MB or memory, and
2 GB of local disk. A key aspect of a task is that the resources
associated with the task must be allocated as an atomic unit, without
spanning node boundaries. A task requesting 2 processors cannot be
satisfied by allocating 2 uniprocessor nodes, nor can a task requesting
1 processor and 1 GB of memory be satisfied by allocating 1 processor on
one node and memory on another.
In Maui, when jobs or reservations request resources,
they do so in terms of tasks typically using a task count and a
task definition. By default, a task maps directly to a single
processor within a job and maps to a full node within reservations.
In all cases, this default definition can be overridden by specifying a
new task definition.
Within both jobs and reservations, depending on task
definition, it is possible to have multiple tasks from the same job mapped
to the same node. For example, a job requesting 4 tasks using the
default task definition of 1 processor per task, can be satisfied by two
dual processor nodes.
The concept of the processor equivalent,
or PE, arose out of the need to translate multi-resource consumption requests
into a scalar value. It is not an elementary resource, but rather,
a derived resource metric. It is a measure of the actual impact
of a set of requested resources by a job on the total resources available
system wide. It is calculated as:
PE = MAX(ProcsRequestedByJob / TotalConfiguredProcs,
MemoryRequestedByJob / TotalConfiguredMemory,
DiskRequestedByJob / TotalConfiguredDisk,
SwapRequestedByJob / TotalConfiguredSwap) * TotalConfiguredProcs
For example, say a job requested
20% of the total processors and 50% of the total memory of a 128 processor
MPP system. Only two such jobs could be supported by this system.
The job is essentially using 50% of all available resources since the system
can only be scheduled to its most constrained resource, in this case memory.
The processor equivalents for this job should be 50% of the processors,
or PE = 64.
Let's make the calculation
concrete with one further example. Assume a homogeneous 100 node
system with 4 processors and 1 GB of memory per node. A job is submitted
requesting 2 processors and 768 MB of memory. The PE for this job
would be calculated as:
PE = MAX(2/(100*4), 768/(100*1024)) * (100*4) = 3.
This result makes sense since
the job would be consuming 3/4 of the memory on a 4 processor node.
The calculation works equally
well on homogeneous or heterogeneous systems, uniprocessor or large way
188.8.131.52 Class (or Queue)
A class (or queue) is a logical container
object which can be used to implicitly or explicitly apply policies to
jobs. In most cases, a class is defined and configured within the
resource manager and associated with one or more of the following attributes
|Default Job Attributes
||A queue may be associated with a default job duration, default size,
or default resource requirements
||A queue may constrain job execution to a particular set of hosts
||A queue may constrain the attributes of jobs which may submitted including
setting limits such as max wallclock time, minimum number of processors,
||A queue may constrain who may submit jobs into it based on user lists,
group lists, etc.
||A queue may associate special privileges with jobs including adjusted
As stated previously, most resource managers allow
full class configuration within the resource manager. Where additional
class configuration is required, the CLASSCFG
parameter may be used.
Maui tracks class usage as a consumable resource
allowing sites to limit the number of jobs using a particular class.
This is done by monitoring class initiators which may be considered
to be a ticket to run in a particular class. Any compute node may
simultaneously support several types of classes and any number of initiators
of each type. By default, nodes will have a one-to-one mapping between
class initiators and configured processors. For every job task
run on the node, one class initiator of the appropriate type is consumed.
For example, a 3 processor job submitted to the class batch will consume
three batch class initiators on the nodes where it is run.
Using queues as consumable resources allows sites
to specify various policies by adjusting the class initiator to node mapping.
For example, a site running serial jobs may want to allow a particular
8 processor node to run any combination of batch and special jobs subject
to the following constraints:
- only 8 jobs of any type allowed simultaneously
- no more than 4 special jobs allowed simultaneously
To enable this policy, the site may set the node's
MAXJOB policy to 8 and configure the node with 4 special class initiators
and 8 batch class initiators.
Note that in virtually all cases jobs have a one-to-one
correspondence between processors requested and class initiators required.
However, this is not a requirement and, with special configuration sites
may choose to associate job tasks with arbitrary combinations of class
In displaying class initiator status, Maui signifies
the type and number of class initiators available using the format [<CLASSNAME>:<CLASSCOUNT>].
This is most commonly seen in the output of node status commands indicating
the number of configured and available class initiators, or in job status
commands when displaying class initiator requirements.
Node can also be configured to support
various 'arbitrary resources'. Information about such resources can
be specified using the NODECFG parameter.
For example, a node may be configured to have '256 MB RAM, 4 processors,
1 GB Swap, and 2 tape drives'.