Dynamic and Malleable Jobs

21.3 Dynamic and Malleable Jobs

21.3.1 Dynamic Job Overview

Moab supports both dynamic and malleable jobs allowing internal and external job steering, management of autonomous job pools, and other features.

21.3.2 Dynamic Jobs

Dynamic jobs may be adjusted over their active lifetime acquiring and releasing resources according to internal and external workload conditions. As a dynamic job's compute requirements grow, it can either explicitly request additional resources from Moab or simply report this growth through a general internal load. Moab will consider this internal load and the value (or priority) of this request relative to other requests and determine if these requests can be satisfied.

If the request can be satisfied, Moab will allocate additional resources to the job and notify the job of the resource availability. If the dynamic job's load is reduced and other jobs could use additional resources, Moab will notify the job of the need to release a portion of the allocated resources.

With the dynamic allocation to and from these jobs, Moab can allow sites to effectively load balance compute resources amongst multiple competing consumers, enabling the mixing of static and dynamic cluster jobs with the loads of web, database, and visualization server farms.

Dynamic jobs are enabled by using the DYNAMIC flag in moab.cfg.

JOBCFG[ServerJobs] FLAGS=DYNAMIC

21.3.2.1 Service Based Dynamic Jobs

Service based dynamic jobs are often not considered batch jobs at all. Rather, they are consumers of compute resources such as parallel databases or web service applications that generally have dedicated resources associated with them. Within Moab, these services can be encapsulated as a job and associated with ownership, priority, and resource requirement attribute information. Using this information, Moab can intelligently distribute available resources across different request types to satisfy site mission objectives.

To manage a service as a dynamic job, a few key interfaces must be enabled. These interfaces are required to enable the following:

  • query existence/health of job
  • determine current and/or anticipated future resource load
  • allow job to immediately request additional resources
  • allow job to release unneeded resources
  • signal job indicating additional resources have been allocated and are immediately available
  • signal job indicating that some allocated resources must be released

Using some or all of these interfaces, independent services can be set up that fully manage and schedule workload inside of the pool of allocated resources. This allows high-speed transaction based services to be enabled that use context sensitive algorithms to better optimize and load-balance the incoming requests with no overhead from the batch system. The external batch system only inter-operates with the service at a coarse-grained high level, balancing the needs of multiple services and standard batch workload.

21.3.2.2 Translating Services to Dynamic Jobs with the Native Resource Manager

The native resource manager interface allows sites to envelop a new or existing service with the interfaces required to drive it as an integrated dynamic job using scripts, Web or SQL based services. To fully integrate a service as a dynamic job, the following can be used:

Resource Manager Interfaces - Outgoing Moab to Job Requests

  • WorkloadQueryURL — Determine job load, ownership and resource attributes, and current state.
  • JobModifyURL — Specify that the job should adjust its current resource allocation.

Using WorkloadQueryURL for Obtaining Service State Information

The workload query URL is called each iteration and is responsible for reporting back basic information in the form of a job stanza for each service/application that is being monitored. This information should include job ID (which should be synchronized with any corresponding JOBCFG profiles specified in moab.cfg), current load information, job ownership (such as user, group, account), and optional resource constraint information to indicate what type of resources can be allocated to the job.

The workload query URL receives no input arguments and in the case of an exec type native resource manager interface, reports all output to STDOUT. The script tools/job.query.dyn.pl is included in the Moab tools directory and provides an example of how to report application information as a job.

Using JobModifyURL for Allocation/Deallocation

The jobmodify URL is called each time Moab is able to adjust a dynamic job's current resource allocation. Regardless of whether this call is to increase or decrease the resources allocated to the job, this URL is called with the following arguments:

Argument Index Name Description
Command For dynamic job modification, this will be the string modify.
Attribute For dynamic job modification, this will be the string allocnodelist indicating the current node list allocated to the job by Moab.
Value The allocnodelist value will be a comma delimited list of host names.

The systemmodify URL can immediately contact and interface with the dynamic job application via sockets, a Web service, a database, or a flat file. It can issue UNIX signals, directly start and kill remote processes, or perform whatever other application specific actions are required to inform the service of the resource change. In the sample tool tools/job.modify.dyn.pl, the script merely writes the new allocated node list to a flat file and then the actual changes are considered each iteration when the tools/job.query.dyn.pl script is executed. This approach allows the application/service to adjust its actual resource usage over time and approach the resource list allocated by Moab as conditions permit.

Peer Client Interfaces - Incoming Job-to-Moab Requests

In the simplest cases, there are no explicit job-to-Moab requests generated. Each job merely indicates its current load and Moab is configured to auto-adjust resources allocated to keep this load within a targeted load range. However, if explicit resource allocation is required, the application can either express this requirement via the TASKSREQUESTED resource manager extension that will be loaded at the next workload query cycle or can immediately and directly request the change using the mjobctl -m command.

21.3.2.3 Moab Policy Configuration for Dynamic Job Management

Moab enables a number of policies that allow sites to specify how conflicting resource management requirements can be managed and how job load levels can be translated into priority adjustments and allocation decisions.

Policy Description
Specifies the minimum and maximum number of nodes that can be allocated to a dynamic job.

JOBCFG[db] NODERANGE=4,16

Specifies the priority of workload/service and thus the order in which resource allocation will be adjusted.

JOBCFG[db] PRIORITY=1600

Specifies the threshold backlog that will cause Moab to automatically adjust allocated resources.

JOBCFG[db] TARGETBACKLOG=1000,200000

Moab will allocate additional resources if backlog exceeds 200000 transactions and will release resources if backlog drops below 1000 transactions.

Specifies the threshold loads that will cause Moab to automatically adjust allocated resources.

JOBCFG[db] TARGETLOAD=0.5,2.0

Moab will allocate additional resources if load exceeds 2.0 and will release resources if load drops below 0.5.

Specifies the threshold response time (in seconds) that will cause Moab to automatically adjust allocated resources.

JOBCFG[db] TARGETRESPONSETIME=0.1,1.2

Moab will allocate additional resources if response time exceeds 1.2 seconds and will release resources if response time drops below 100 milliseconds.

21.3.2.4 Diagnostics and Directly Managing Dynamic Job Allocations

Using the mjobctl command, specific or general resources can be directly allocated to or deallocated from dynamic jobs. In particular, the -m (modify) flag can be used to adjust the job's nodelist attribute to adjust resource allocation directly.

Below are several examples of using mjobctl -m to update the list of nodes allocated to a job. The first example sets the nodelist, the second adds a new node to the existing list, and the third removes one of those nodes from the list.

> mjobctl -m allocnodelist=node001,node002 job.11

> mjobctl -m allocnodelist+=node003 job.11

> mjobctl -m allocnodelist-=node002 job.11

The checkjob command can be used to evaluate current dynamic job allocations, historical load levels and thresholds, and other factors relevant to dynamic jobs.

The mdiag -R -v command will report issues seen within the native resource manager interface tools used to monitor and control the dynamic jobs and services.

21.3.3 Malleable Jobs

Malleable jobs are jobs that can be adjusted in terms of resources and duration required, and which allow the scheduler to maximize job responsiveness by selecting a job's resource shape or footprint prior to job execution. Once a job has started, however, its resource footprint is fixed until job completion.

To enable malleable jobs, the underlying resource manager must support dynamic modification of resource requirements prior to execution (i.e., TORQUE) and the jobs must be submitted using the TRL (task request list) resource manager extension string. With the TRL attribute specified, Moab will attempt to select a start time and resource footprint to minimize job completion time and maximize overall effective system utilization (i.e., <AverageJobEfficiency> * <AverageSystemUtilization>).

Example

With the following job submission, Moab will execute the job in one of the following configurations: 1 node for 1 hour, 2 nodes for 30 minutes, or 4 nodes for 15 minutes.

> qsub -l nodes=1,trl=1@3600:2@1800:4@900 testjob.cmd

job 72436.orion submitted

21.3.4 Case Studies

21.3.4.1 Dynamic Shared Visualization Cluster Case Study

Moab includes sample dynamic job tools that allow the driving of a hypothetical cluster that shares resources between a dynamic visualization service and standard batch workload. The scripts tools/job.query.dyn.pl and tools/job.modify.dyn.pl provide job monitoring and job modification services respectively.

In this environment, cluster managers are interested in allowing the visualization service to grow and shrink according to current load but to stay only inside the portion of the total cluster that has the viz node feature set. They are also interested in maximizing cluster efficiency and scheduling batch jobs aggressively onto unused viz resources. Using the following configuration, the cluster balances visualization and batch workload.

# moab.cfg

SCHEDCFG[viz] MODE=NORMAL SERVER=viz.org:42000

RMCFG[batch] TYPE=PBS # TORQUE
RMCFG[viz]   TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl
RMCFG[viz]   JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl

QOSCFG[viz]  PRIORITY=1000 QFLAGS=DYNAMIC MEMBERULIST=viz

JOBCFG[vizserver] NODERANGE=2,64 REQFEATURES=viz UNAME=viz GNAME=viz
JOBCFG[vizserver] QOS=viz TARGETLOAD=0.5,2.0

SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY
SRCFG[viz] USERLIST=viz CLASSLIST=debug 
SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT

In this configuration several things are accomplished. First, Moab is configured to obtain information from both a TORQUE based resource manager and from the native resource manager controlling the visualization server. Secondly, the viz QoS is defined to allow dynamic jobs and to provide viz jobs with a high priority relative to other jobs. Next, the JOBCFG parameter places bounds on the allowed size of the vizserver job and allows it to be dynamic by associating it with the viz QoS. The JOBCFG parameter also specifies the TARGETLOAD attribute that Moab uses to translate job load to adjustments in allocated nodes.

Finally, the preceding configuration creates a permanent standing reservation that is mapped to all nodes that have a feature of viz. This standing reservation is owned by the viz user, and allows both the viz job and debug class jobs to use the reserved resources. However, if the visualization cluster ever needs to grow and debug jobs are in the way, the OWNERPREEMPT flag will allow Moab to preempt the debug jobs to free up resources for the viz job.

With this configuration and setup, Moab is able to fully control and load balance the cluster. The site administrator is able to customize exactly how Moab scheduling decisions interface with the visualization server by changing the native resource manager interface scripts. In the default case, each time Moab adjusts the size of the dynamic visualization job, the job.modify.dyn.pl script simply writes the host list which Moab is currently dedicating to the viz job into a flat text file. Then, each iteration when Moab executes the job.query.dyn.pl script, this script compares the nodes currently allocated by the visualization server against the list of nodes provided by Moab (as shown in the flat text file) and adjusts server behavior as needed. It then reports back basic job health information and current load that Moab uses to make subsequent load balancing decisions.

21.3.4.2 Generalized Dynamic Shared Visualization Cluster Case Study

While the previous example examined how to load-balance an external service with batch or other types of workload, in some environments, it may be desirable to integrate these services more tightly with the resource manager. Some benefits of this tighter integration include the following:

  • resource manager level statistics and accounting records
  • resource manager based service launch and termination
  • support for non-root based services
  • ability to expand services to arbitrary resources via generalized batch scripts

In this model, the scheduler is responsible for identifying available resources and launching/terminating specific slave jobs as needed to properly load-balance cluster workload. Each master application job tracks its multiple child slave jobs and represents aggregate usage for all slaves.

SCHEDCFG[viz] MODE=NORMAL SERVER=viz.org:42000

RMCFG[batch] TYPE=PBS # TORQUE
RMCFG[viz]   TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl
RMCFG[viz]   JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl

QOSCFG[viz]  PRIORITY=1000 QFLAGS=DYNAMIC MEMBERULIST=viz

JOBCFG[vizserver] NODERANGE=2,64  REQFEATURES=viz  UNAME=viz  GNAME=viz
JOBCFG[vizserver] QOS=viz  TARGETLOAD=0.5,2.0  SLAVESCRIPT=file:///opt/tools/launchviz.cmd

SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY
SRCFG[viz] USERLIST=viz CLASSLIST=debug
SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT

Note that the only change is the presence of the SLAVESCRIPT attribute. If specified, when a dynamic application must grow, Moab will submit and launch a request via the resource manager. When resources are to be released, Moab will cancel the excess slave jobs.

21.3.2.3 Advanced Dynamic Jobs with Grid and OnDemand Co-Allocation Integration Case Study

Previous examples highlighted use of dynamic jobs within a self-contained cluster. However, there is nothing that prevents their use in managing services, batch jobs, and other workload in a grid or on-demand utility computing environment. In the following example, a customer cluster is connected to a grid and two independent utility computing centers and manages a combination of batch, calendar, interactive, and service workload. A subset of the managed services are persistent while others are batch based. The Moab configuration described in what follows will allow critical services and applications to grow according to load and allocate local and on-demand resources as needed to reach specified performance targets.

SCHEDCFG[datacenter-houston] MODE=NORMAL SERVER=houston1.globalcom.com:42000

# interface to local cluster resource manager
RMCFG[batch] TYPE=PBS # TORQUE

# interface to monitor/drive persistent and batch services
RMCFG[service]   TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl
RMCFG[service]   JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl

# define grid interface to Tokyo data center
RMCFG[datacenter-tokyo] TYPE=moab SERVER=tokyo1.globalcom.com

# interface to internal ondemand, dynamically provisioned hosting center
RMCFG[od-globalcom] TYPE=moab SERVER=uc.dallas.globalcom.com FLAGS=hostingcenter

# interface to external commercial ondemand hosting centers
RMCFG[rsystems]      TYPE=moab FLAGS=hostingcenter SERVER=www.rsystems.com
RMCFG[computesource] TYPE=moab FLAGS=hostingcenter SERVER=odmaster.computesource.com

# define web service - allow use of any accessible resources
QOSCFG[web]  PRIORITY=4000  QFLAGS=DYNAMIC,ONDEMAND MEMBERULIST=web

JOBCFG[webserver] NODERANGE=0,1024  UNAME=web  GNAME=web
JOBCFG[webserver] QOS=web  TARGETBACKLOG=0.2,3.0  

# define visualization service - allow use of only houston and dallas resources
QOSCFG[viz]  PRIORITY=1000  QFLAGS=DYNAMIC MEMBERULIST=viz

JOBCFG[vizserver] NODERANGE=2,64  REQFEATURES=viz  UNAME=viz  GNAME=viz  PLIST=houston,dallas
JOBCFG[vizserver] QOS=viz  TARGETLOAD=0.5,2.0  SLAVESCRIPT=file:///opt/tools/launchviz.cmd

SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY
SRCFG[viz] USERLIST=viz CLASSLIST=debug
SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT

Home Up Previous Next
Searches Moab documentation only