Grid Scheduling Policies
Moab Workload Manager® for Grids

17.10 Grid Scheduling Policies

17.10.1 Peer-to-Peer Resource Affinity Overview

The concept of resource affinity stems from a number of facts:

  • Certain compute architectures are able to execute certain compute jobs more effectively than others.
  • From a given location, staging jobs to various clusters may require more expensive allocations, more data and network resources, and more use of system services.
  • Certain compute resources are owned by external organizations and should be used sparingly.

Regardless of the reason, Moab servers allow the use of peer resource affinity to guide jobs to the clusters that make the best fit according to a number of criteria.

At a high level, this is accomplished by creating a number of job profiles and associating the profiles with various peers with varying impacts on estimated execution time and peer affinity.

17.10.2 Peer Allocation Policies

A direct way to assign a peer allocation algorithm is with the PARALLOCATIONPOLICY parameter. Legal values are listed in the following table:

Value Description
BestFit Allocates resources from the eligible peer with the fewest available resources; measured in tasks (minimizes fragmentation of large resource blocks).
BestFitP Allocates resources from the eligible peer with the fewest available resources; measured in percent of configured resources (minimizes fragmentation of large resource blocks).
FirstStart Allocates resources from the eligible peer that can start the job the soonest.
FirstCompletion Allocates resources from the eligible peer that can complete the job the soonest. (Takes into account data staging time and job-specific machine speed.)
LoadBalance Allocates resources from the eligible peer with the most available resources; measured in tasks (balances workload distribution across potential peers).
LoadBalanceP Allocates resources from the eligible peer with the most available resources; measured in percent of configured resources (balances workload distribution across potential peers).
RoundRobin Allocates resources from the eligible peer that has been least recently allocated.

NOTE: The mdiag -t -v command can be used to view current calculated partition priority values.

17.10.3 Peer-to-Peer Job Profiles

Because the quality of the fit between a given job and a given peer resource may be completely independent of any job credential, Moab uses job profiles to categorize jobs for affinity purposes. A job profile is specified using the JOBCFG parameter and allows specification of a number of jobs attributes including WALLTIME, APPLICATIONTYPE, NODECOUNT, PROCCOUNT, NETWORK, MEMORY, DISK, ARCHITECTURE or NODEFEATURES.

Example

moab.cfg
JOBCFG[profile.smp]  MEMORY=1024  NODEFEATURES=smp
...

17.10.4 Peer-to-Peer Job Affinity

Peer affinity is established by associating one or more job profiles with a peer resource manager interface and by tying a speed or affinity factor to that connection using the SPEED and ALLOCATIONAFFINITY attributes of the RMCFG parameter.

Example

moab.cfg
SCHEDCFG[clusterA]   SCHEDULINGPOLICY=earliestcompletion

JOBCFG[profile.smp]  MEMORY=1024  NODEFEATURES=smp
RMCFG[clusterB]      TYPE=MOAB ALLOCATIONAFFINITY[profile.smp]=1.0  SPEED[profile.smp]=4.2

...

In the preceding example, the scheduler prefers to route jobs that meet the profile.smp job profile to clusterB and will also scale the walltime for these jobs by the SPEED factor.

17.10.5 Peer-to-Peer Min/Max Job Affinity

In some cases, job profiles should only be applied to ranges of resource requests. For example, a site may want to route jobs with low, medium, and high memory requirements to different locations. This can be done using Min and Max job affinities as in the following example:

Example

moab.cfg
SCHEDCFG[clusterA]   SCHEDULINGPOLICY=earliestcompletion

JOBCFG[profile.low]  MEMORY=256
JOBCFG[profile.med]  MEMORY=1024
JOBCFG[profile.high] MEMORY=2048 

RMCFG[clusterA]      TYPE=PBS ALLOCATIONAFFINITY[profile.low,profile.med]=1.0

RMCFG[clusterB]      TYPE=MOAB ALLOCATIONAFFINITY[profile.med,profile.high]=1.0
RMCFG[clusterB]      SPEED[profile.med,profile.high]=4.2

RMCFG[clusterC]      TYPE=MOAB ALLOCATIONAFFINITY[profile.high]=1.0
RMCFG[clusterC]      SPEED[profile.med,profile.high]=2.7

...

In the preceding example, low memory jobs will have an affinity to run locally, med memory jobs will be targeted for clusterB, and high memory jobs will be targeted to run on clusterC.

17.10.6 Peer-to-Peer Job Blocking

Jobs can be blocked from ever running on a given cluster by setting the job profile based ALLOCATIONAFFINITY to 0.

17.10.7 Peer-to-Peer Job Automatic Profiles

If the SPEED attribute is set to AUTO, Moab identifies jobs that meet the specified job profile criteria, monitors and records their performance, and then adjusts the effective value of SPEED and ALLOCATIONAFFINITY accordingly.

Example

moab.cfg
SCHEDCFG[clusterA]   SCHEDULINGPOLICY=earliestcompletion

JOBCFG[matlab]   APPLICATION=matlab
JOBCFG[geo3]     APPLICATION=geo3
JOBCFG[nwchem]   APPLICATION=nwchem

RMCFG[clusterB]  TYPE=MOAB SPEED[matlab]=AUTO SPEED[geo3]=AUTO SPEED[nwchem]=AUTO
RMCFG[clusterC]  TYPE=MOAB SPEED[matlab]=AUTO SPEED[geo3]=AUTO SPEED[nwchem]=AUTO

...

17.10.8 Importing External Job Profiling Information

Some systems are able to generate job profiles that can quickly model performance based on resource configuration information. This profiling information can be calculated and associated with a job profile/peer cluster pair or it can be dynamically calculated by specifying the SPEED or ALLOCATIONAFFINITY attributes as a URL.

Example

moab.cfg
SCHEDCFG[clusterA]   SCHEDULINGPOLICY=earliestcompletion

JOBCFG[matlab]   APPLICATION=matlab

RMCFG[clusterB]  TYPE=MOAB SPEED[matlab]=exec:///opt/bin/matlab-model 
RMCFG[clusterC]  TYPE=MOAB SPEED[matlab]=exec:///opt/bin/matlab-model

...

In the preceding example, jobs that fit into the matlab job profile have their peer cluster relative speeds calculated by executing the /opt/bin/matlab-model program. This program is passed command line arguments that provide job and resource information and reports back an estimated speed factor.