21.3 Dynamic and Malleable Jobs
Moab supports both dynamic and malleable jobs allowing internal and external job steering, management of autonomous job pools, and other features.
Dynamic jobs may be adjusted over their active lifetime acquiring and releasing resources according to internal and external workload conditions. As a dynamic job's compute requirements grow, it can either explicitly request additional resources from Moab or simply report this growth through a general internal load. Moab will consider this internal load and the value (or priority) of this request relative to other requests and determine if these requests can be satisfied.
If the request can be satisfied, Moab will allocate additional resources to the job and notify the job of the resource availability. If the dynamic job's load is reduced and other jobs could use additional resources, Moab will notify the job of the need to release a portion of the allocated resources.
With the dynamic allocation to and from these jobs, Moab can allow sites to effectively load balance compute resources amongst multiple competing consumers, enabling the mixing of static and dynamic cluster jobs with the loads of web, database, and visualization server farms.
Dynamic jobs are enabled by using the DYNAMIC flag in moab.cfg.
Service based dynamic jobs are often not considered batch jobs at all. Rather, they are consumers of compute resources such as parallel databases or web service applications that generally have dedicated resources associated with them. Within Moab, these services can be encapsulated as a job and associated with ownership, priority, and resource requirement attribute information. Using this information, Moab can intelligently distribute available resources across different request types to satisfy site mission objectives.
To manage a service as a dynamic job, a few key interfaces must be enabled. These interfaces are required to enable the following:
Using some or all of these interfaces, independent services can be set up that fully manage and schedule workload inside of the pool of allocated resources. This allows high-speed transaction based services to be enabled that use context sensitive algorithms to better optimize and load-balance the incoming requests with no overhead from the batch system. The external batch system only inter-operates with the service at a coarse-grained high level, balancing the needs of multiple services and standard batch workload.
The native resource manager interface allows sites to envelop a new or existing service with the interfaces required to drive it as an integrated dynamic job using scripts, Web or SQL based services. To fully integrate a service as a dynamic job, the following can be used:
Resource Manager Interfaces - Outgoing Moab to Job Requests
Using WorkloadQueryURL for Obtaining Service State Information
The workload query URL is called each iteration and is responsible for reporting back basic information in the form of a job stanza for each service/application that is being monitored. This information should include job ID (which should be synchronized with any corresponding JOBCFG profiles specified in moab.cfg), current load information, job ownership (such as user, group, account), and optional resource constraint information to indicate what type of resources can be allocated to the job.
The workload query URL receives no input arguments and in the case of an exec type native resource manager interface, reports all output to STDOUT. The script tools/job.query.dyn.pl is included in the Moab tools directory and provides an example of how to report application information as a job.
Using JobModifyURL for Allocation/Deallocation
The jobmodify URL is called each time Moab is able to adjust a dynamic job's current resource allocation. Regardless of whether this call is to increase or decrease the resources allocated to the job, this URL is called with the following arguments:
The systemmodify URL can immediately contact and interface with the dynamic job application via sockets, a Web service, a database, or a flat file. It can issue UNIX signals, directly start and kill remote processes, or perform whatever other application specific actions are required to inform the service of the resource change. In the sample tool tools/job.modify.dyn.pl, the script merely writes the new allocated node list to a flat file and then the actual changes are considered each iteration when the tools/job.query.dyn.pl script is executed. This approach allows the application/service to adjust its actual resource usage over time and approach the resource list allocated by Moab as conditions permit.
Peer Client Interfaces - Incoming Job-to-Moab Requests
In the simplest cases, there are no explicit job-to-Moab requests generated. Each job merely indicates its current load and Moab is configured to auto-adjust resources allocated to keep this load within a targeted load range. However, if explicit resource allocation is required, the application can either express this requirement via the TASKSREQUESTED resource manager extension that will be loaded at the next workload query cycle or can immediately and directly request the change using the mjobctl -m command.
Moab enables a number of policies that allow sites to specify how conflicting resource management requirements can be managed and how job load levels can be translated into priority adjustments and allocation decisions.
Using the mjobctl command, specific or general resources can be directly allocated to or deallocated from dynamic jobs. In particular, the -m (modify) flag can be used to adjust the job's nodelist attribute to adjust resource allocation directly.
Below are several examples of using mjobctl -m to update the list of nodes allocated to a job. The first example sets the nodelist, the second adds a new node to the existing list, and the third removes one of those nodes from the list.
> mjobctl -m allocnodelist=node001,node002 job.11 > mjobctl -m allocnodelist+=node003 job.11 > mjobctl -m allocnodelist-=node002 job.11
The checkjob command can be used to evaluate current dynamic job allocations, historical load levels and thresholds, and other factors relevant to dynamic jobs.
The mdiag -R -v command will report issues seen within the native resource manager interface tools used to monitor and control the dynamic jobs and services.
Malleable jobs are jobs that can be adjusted in terms of resources and duration required, and which allow the scheduler to maximize job responsiveness by selecting a job's resource shape or footprint prior to job execution. Once a job has started, however, its resource footprint is fixed until job completion.
To enable malleable jobs, the underlying resource manager must support dynamic modification of resource requirements prior to execution (i.e., TORQUE) and the jobs must be submitted using the TRL (task request list) resource manager extension string. With the TRL attribute specified, Moab will attempt to select a start time and resource footprint to minimize job completion time and maximize overall effective system utilization (i.e., <AverageJobEfficiency> * <AverageSystemUtilization>).
With the following job submission, Moab will execute the job in one of the following configurations: 1 node for 1 hour, 2 nodes for 30 minutes, or 4 nodes for 15 minutes.
> qsub -l nodes=1,trl=1@3600:2@1800:4@900 testjob.cmd job 72436.orion submitted
Moab includes sample dynamic job tools that allow the driving of a hypothetical cluster that shares resources between a dynamic visualization service and standard batch workload. The scripts tools/job.query.dyn.pl and tools/job.modify.dyn.pl provide job monitoring and job modification services respectively.
In this environment, cluster managers are interested in allowing the visualization service to grow and shrink according to current load but to stay only inside the portion of the total cluster that has the viz node feature set. They are also interested in maximizing cluster efficiency and scheduling batch jobs aggressively onto unused viz resources. Using the following configuration, the cluster balances visualization and batch workload.
# moab.cfg SCHEDCFG[viz] MODE=NORMAL SERVER=viz.org:42000 RMCFG[batch] TYPE=PBS # TORQUE RMCFG[viz] TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl RMCFG[viz] JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl QOSCFG[viz] PRIORITY=1000 QFLAGS=DYNAMIC MEMBERULIST=viz JOBCFG[vizserver] NODERANGE=2,64 REQFEATURES=viz UNAME=viz GNAME=viz JOBCFG[vizserver] QOS=viz TARGETLOAD=0.5,2.0 SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY SRCFG[viz] USERLIST=viz CLASSLIST=debug SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT
In this configuration several things are accomplished. First, Moab is configured to obtain information from both a TORQUE based resource manager and from the native resource manager controlling the visualization server. Secondly, the viz QoS is defined to allow dynamic jobs and to provide viz jobs with a high priority relative to other jobs. Next, the JOBCFG parameter places bounds on the allowed size of the vizserver job and allows it to be dynamic by associating it with the viz QoS. The JOBCFG parameter also specifies the TARGETLOAD attribute that Moab uses to translate job load to adjustments in allocated nodes.
Finally, the preceding configuration creates a permanent standing reservation that is mapped to all nodes that have a feature of viz. This standing reservation is owned by the viz user, and allows both the viz job and debug class jobs to use the reserved resources. However, if the visualization cluster ever needs to grow and debug jobs are in the way, the OWNERPREEMPT flag will allow Moab to preempt the debug jobs to free up resources for the viz job.
With this configuration and setup, Moab is able to fully control and load balance the cluster. The site administrator is able to customize exactly how Moab scheduling decisions interface with the visualization server by changing the native resource manager interface scripts. In the default case, each time Moab adjusts the size of the dynamic visualization job, the job.modify.dyn.pl script simply writes the host list which Moab is currently dedicating to the viz job into a flat text file. Then, each iteration when Moab executes the job.query.dyn.pl script, this script compares the nodes currently allocated by the visualization server against the list of nodes provided by Moab (as shown in the flat text file) and adjusts server behavior as needed. It then reports back basic job health information and current load that Moab uses to make subsequent load balancing decisions.
While the previous example examined how to load-balance an external service with batch or other types of workload, in some environments, it may be desirable to integrate these services more tightly with the resource manager. Some benefits of this tighter integration include the following:
In this model, the scheduler is responsible for identifying available resources and launching/terminating specific slave jobs as needed to properly load-balance cluster workload. Each master application job tracks its multiple child slave jobs and represents aggregate usage for all slaves.
SCHEDCFG[viz] MODE=NORMAL SERVER=viz.org:42000 RMCFG[batch] TYPE=PBS # TORQUE RMCFG[viz] TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl RMCFG[viz] JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl QOSCFG[viz] PRIORITY=1000 QFLAGS=DYNAMIC MEMBERULIST=viz JOBCFG[vizserver] NODERANGE=2,64 REQFEATURES=viz UNAME=viz GNAME=viz JOBCFG[vizserver] QOS=viz TARGETLOAD=0.5,2.0 SLAVESCRIPT=file:///opt/tools/launchviz.cmd SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY SRCFG[viz] USERLIST=viz CLASSLIST=debug SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT
Note that the only change is the presence of the SLAVESCRIPT attribute. If specified, when a dynamic application must grow, Moab will submit and launch a request via the resource manager. When resources are to be released, Moab will cancel the excess slave jobs.
Previous examples highlighted use of dynamic jobs within a self-contained cluster. However, there is nothing that prevents their use in managing services, batch jobs, and other workload in a grid or on-demand utility computing environment. In the following example, a customer cluster is connected to a grid and two independent utility computing centers and manages a combination of batch, calendar, interactive, and service workload. A subset of the managed services are persistent while others are batch based. The Moab configuration described in what follows will allow critical services and applications to grow according to load and allocate local and on-demand resources as needed to reach specified performance targets.
SCHEDCFG[datacenter-houston] MODE=NORMAL SERVER=houston1.globalcom.com:42000 # interface to local cluster resource manager RMCFG[batch] TYPE=PBS # TORQUE # interface to monitor/drive persistent and batch services RMCFG[service] TYPE=NATIVE WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl RMCFG[service] JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl # define grid interface to Tokyo data center RMCFG[datacenter-tokyo] TYPE=moab SERVER=tokyo1.globalcom.com # interface to internal ondemand, dynamically provisioned hosting center RMCFG[od-globalcom] TYPE=moab SERVER=uc.dallas.globalcom.com FLAGS=hostingcenter # interface to external commercial ondemand hosting centers RMCFG[rsystems] TYPE=moab FLAGS=hostingcenter SERVER=www.rsystems.com RMCFG[computesource] TYPE=moab FLAGS=hostingcenter SERVER=odmaster.computesource.com # define web service - allow use of any accessible resources QOSCFG[web] PRIORITY=4000 QFLAGS=DYNAMIC,ONDEMAND MEMBERULIST=web JOBCFG[webserver] NODERANGE=0,1024 UNAME=web GNAME=web JOBCFG[webserver] QOS=web TARGETBACKLOG=0.2,3.0 # define visualization service - allow use of only houston and dallas resources QOSCFG[viz] PRIORITY=1000 QFLAGS=DYNAMIC MEMBERULIST=viz JOBCFG[vizserver] NODERANGE=2,64 REQFEATURES=viz UNAME=viz GNAME=viz PLIST=houston,dallas JOBCFG[vizserver] QOS=viz TARGETLOAD=0.5,2.0 SLAVESCRIPT=file:///opt/tools/launchviz.cmd SRCFG[viz] HOSTLIST=ALL NODEFEATURE=viz PERIOD=INFINITY SRCFG[viz] USERLIST=viz CLASSLIST=debug SRCFG[viz] OWNER=user:viz FLAGS=OWNERPREEMPT
Searches Moab documentation only
|© 2001-2010 Adaptive Computing Enterprises, Inc.|