Grid Basics
Moab Workload Manager® for Grids

17.1 Grid Basics

17.1.1 Grid Overview

A grid enables you to exchange workload and resource status information and to distribute jobs and data among clusters in an established relationship. In addition, you can use resource reservations to mask reported resources, coordinate requests for consumable resources, and enable multi-cluster co-allocation and quality of service guarantees.

In a grid, some servers running Moab are a source for jobs (that is, where users, portals, and other systems submit jobs), while other servers running Moab are a destination for these jobs (that is, where the jobs execute). Thus, jobs originate from a source server and move to a destination server. For a source server to make an intelligent decision, though, resource availability information must flow from a destination server to that source server.

Because you can manage workload on both the source and destination side of a grid relationship, you have a high degree of control over exactly when, how, and where to execute workload.

17.1.2 Grid Benefits

Moab's peer-to-peer capabilities can be used for multiple purposes, including any of the following:

  • manage access to external shared resources
  • enable cluster monitoring information services
  • enable massive-scalability clusters
  • enable distributed grid computing

Of these, the most common use is the creation of grids to join multiple centrally managed, partially autonomous, or fully autonomous clusters. The purpose of this section is to highlight the most common uses of grid technology and provide references to sections which further detail their configuration and management. Other sections cover the standard aspects of grid creation including configuring peer relationships, enabling data staging, credential management, usage policies, and other factors.













17.1.3 Scalability

Much like a massive-scalability cluster, a massive scalability grid allows organizations to overcome scalability limitations in resource managers, networks, message passing libraries, security middleware, file systems, and other forms of software and hardware infrastructure. Moab does this by allowing a single large set of resources to be broken into multiple smaller, more manageable clusters, and then virtually re-assembling them using Moab. Moab becomes responsible for integrating the seams between the cluster and presenting a single-system image back to the end-users, administrators, and managers.

17.1.4 Resource Access

In some cases, the primary motivation for creating a grid is to aggregate resources of different types into a single system. This aggregation can enable jobs to do the following:

  1. allow parameter-sweep style multi-job job arrays (that is, multi-step jobs) to run a portion of the jobs on one architecture, and a portion on another.
  2. allow co-allocation jobs which simultaneously require resources of two or more types.

A common example of a multi-architecture parameter-sweep job would be a batch regression test suite which requires a portion of the tests running on Redhat 7.2i, a portion on SUSE 9.1, a portion on Myrinet nodes, and a portion on Infiniband nodes. While it would be very difficult to create and manage a single cluster which simultaneously provided all of these configurations, Moab can be used to create and manage a single grid which spans multiple clusters as needed.

The co-allocation jobs referenced above are demonstrated by the classic example of I need 64 compute nodes, 16 visualization nodes, and a telescope. Moab can actually deliver these resources, co-allocated, and fully reserved. This brings all the resources together simultaneously providing a holistic compute environment required to complete a given task.

17.1.5 Load-Balancing

While grids often have additional motivations, it is rare to have a grid created where increased total system utilization is not an objective. By aggregating the total pool of jobs requesting resources and increasing the pool of resources available to each job, Moab is able to improve overall system utilization, sometimes significantly. The biggest difficulty in managing multiple clusters is preventing inter-cluster policies and the cost of migration from overwhelming the benefits of decreased fragmentation losses. Even though remote resources may be available for immediate usage, migration costs can occur in the form of credential, job, or data staging and impose a noticeable loss in responsiveness on grid workload.

Moab provides tools to allow these costs to be monitored and managed and both cluster and grid level performance to be reported.

17.1.6 Single System Image (SSI)

Another common benefit of grids is the simplicity associated with a single system image based resource pool. This simplicity tends to have direct translation into increased productivity for end-users, administrators, and managers.

An SSI environment tends to increase the efficiency of end-users by minimizing human errors associated with porting a request from a known system to a less known system. Additionally, the single point of access grid reduces human overhead associated with monitoring and managing workload within multiple independent systems.

For system administrators, a single system image can reduce overhead, training time, and diagnostic time associated with managing a cluster. Furthermore, with Moab's peer-to-peer technology, no additional software layer is required to enable the grid and no new tools must be learned. No additional layers mean no additional failure points, and that is good for everyone involved.

Managers benefit from SSI by now being able to pursue organization mission objectives globally in a more coordinated and unified manner. They are also able to monitor progress towards those objects and effectiveness of resources in general, far more easily.

17.1.7 High Availability

A final benefit of grids is there ability to decrease the impact of failures. For some organizations, this benefit is a primary motivation, pulling together additional resources to allow workload to continue to be processed even in the event that some nodes, or even an entire cluster, become unavailable. Whether the resource unavailability is based on node failures, network failures, systems middleware, systems maintenance, or other factors, a properly configured grid can reroute priority workload throughout the grid to execute on other compatible resources.

With grids, there are a number of important factors in high availability that should be considered:

  • enabling highly available job submission/job management interfaces
  • avoiding network failures with redundant routes to compute resources
  • handling partial failures
  • dynamically rerouting migrated jobs
  • dynamically restarting failed jobs

17.1.8 Grid Relationships

Before you construct a grid, determine how you want the clusters in the grid to interact. There are three types of relationships you can implement within the grid:

17.1.8.1 Centralized Management

The centralized management model allows users to submit jobs to a centralized source server running Moab. The source Moab server obtains full resource information from all clusters and makes intelligent scheduling decisions across all clusters. As needed, jobs and data are distributed to the remote clusters. The centralized management model is recommended for intra-organization grid environments when cluster autonomy is not as necessary.

Example of the Centralized Management Model

XYZ Research has three clusters—MOAB1, MOAB2, and MOAB3—running Moab and the TORQUE resource manager. They would like to submit jobs at a single location (cluster MOAB1) and have the jobs run on whichever cluster can provide the best responsiveness.

The desired behavior is essentially a master-slave relationship with the Moab servers on MOAB2 and MOAB3 configured to trust cluster MOAB1 and to execute in slave mode. MOAB1 is the central, or master, cluster. On MOAB1, resource managers point to the local TORQUE resource manager and to the Moab servers on cluster MOAB2 and cluster MOAB3.

With this configuration, XYZ Research may submit jobs to the master Moab server running on cluster MOAB1. This Moab server obtains full resource information from all three clusters and makes intelligent scheduling decisions and distributes jobs and data to the remote clusters.

The Moab servers running on clusters MOAB2 and MOAB3 are destinations behaving like a local resource manager. The Moab server running on MOAB1 is a source, loading and using this resource information.

In the centralized management (master-slave) configuration, roles are clear. In other configurations, individual Moab servers may simultaneously act as sources to some clusters and destinations to others or as both a source and a destination to another cluster.

17.1.8.2 Source-Destination Management

As with the centralized management model, the source-destination model allows users to submit jobs to a centralized source server running Moab. However, in the source-destination model, clusters retain sovereignty, allowing local job submission. Thus, if communication between the source and destination clusters is interrupted, the destination cluster(s) can still run jobs locally.

In the source-destination model, the source Moab server obtains full resource information from all clusters and makes intelligent scheduling decisions across all clusters. As needed, jobs and data are distributed to the remote clusters. Or, if preferred, a destination cluster may also serve as its own source; however, a destination cluster may not serve as a source to another destination cluster. The centralized management model is recommended for intra-organization grid environments when cluster autonomy is not as necessary.

17.1.8.3 Local Management

The local management model allows you to submit jobs on one cluster (MOAB1, for example) and schedule the jobs on another cluster (MOAB3, for example). Jobs can also migrate in the opposite direction (that is, from MOAB3 to MOAB1, for example). The source server(s) running Moab obtain full resource information from all clusters and make intelligent scheduling decisions across all clusters. As needed, jobs and data are migrated to other clusters.

This model allows for clusters to retain their autonomy while still allowing jobs to run on any cluster. No central location for job submission is needed, and you do not need to submit jobs from different nodes based on resource needs. You can submit a job from any location and it is either migrated to nodes on the best utilized cluster or the nodes allocated in the job submission. This model is recommended for grids in an inter-organization grid environment.

17.1.9 Submitting Jobs to the Grid

In any peer-to-peer or grid environment where jobs must be migrated between clusters, the Moab msub command should be used. This command can effectively emulate the job submission capabilities of other common resource management systems such as PBS, LSF, or Loadleveler. (Support for SGE and Condor is currently under development.) This feature allows existing job submit scripts for these resource managers to be submitted and parsed directly by Moab.

Once submitted, Moab is able to identify potential destination resources and, if needed, translate the submitted job to the destination language. This effectively allows users to submit jobs using the submission language they are most comfortable with and have the job run on whichever resources are available first.

Example 1

A small pharmaceutical company, BioGen, runs two clusters in a master-slave relationship. The slave is an older IBM cluster running Loadleveler, while the master manages the slave and also directly manages a large Linux cluster running TORQUE. A new user familiar with LSF has multiple LSF job scripts he would like to continue using. To enable this, the administrators make a symbolic link between the Moab msub client and the file bsub. The user begins submitting his jobs via bsub and, according to availability, the jobs run on either the Loadleveler or TORQUE clusters.

NOTE: Moab can only stage/migrate jobs between resource managers (in between clusters) that have been submitted using the msub command. If jobs are submitted directly to a low-level resource manager such as PBS or SGE, Moab will still be able to schedule them but only on resources directly managed by the resource manager to which they were submitted.

Using Moab's msub job submission command, jobs may be submitted using PBS, LSF, or Loadleveler command file syntax and be run on any system using any of the resource managers (support for SGE and Condor is currently under development). For example, a PBS job script may be submitted using msub and depending on availability, Moab may translate this job and execute it on an LSF cluster.

Example 2

A research lab wants to use spare cycles on its four clusters, each of which is running a local resource manager. In addition to providing better site-wide load balancing, the goal is to also provide some of its users with single point access to all compute resources. Various researchers have made it clear that this new multi-cluster load balancing must not impose any changes on users who are currently using these clusters by submitting jobs locally to each cluster. As a final constraint, the lab wants to allow both web portal and command line based job submission to access the cumulative cluster resources, but the system administrators will not allow the web server to directly access the head nodes of any of the clusters.

In this example, the scheduler mode of the destination clusters should be set to NORMAL rather than SLAVE. In SLAVE mode, Moab makes no local decisions—it simply follows the directions of remote trusted peers. In NORMAL mode, each Moab is fully autonomous, scheduling all local workload and coordinating with remote peers when and how to schedule migrated jobs.

To address the web portal logistics, the lab decides to place the source Moab server on a dedicated node away from all of the head nodes. Moab Access Portal is configured to communicate only with this Moab server. The Moab server running on each cluster is configured to accept the source Moab server as a peer, but only report aggregate resource availability using the grid flag. The Moab Access Portal is then able to show users aggregate lab-wide configured and available compute resources; allow portal users to submit, view, and manage their jobs; and even create personal reservations across the various clusters.

From the perspective of a local cluster user, no new behaviors are seen. Remote jobs are migrated in from time to time, but to the user each job looks as if it were locally submitted. The user continues to submit, view, and manage jobs as before, using existing local jobs scripts.

17.1.10 Viewing Jobs and Resources

By default, each destination Moab server will report all compute nodes it finds back to the source Moab server. These reported nodes appear within the source Moab as local nodes each within a partition associated with the resource manager reporting them. If a source resource manager was named slave1, all nodes reported by it would be associated with the slave1 partition. Users and administrators communicating with the source Moab via Moab Cluster Manager, Moab Access Portal, or standard Moab command line tools would be able to view and analyze all reported nodes.

For security and scalability purposes, this global resource view may not be desirable. To address this, peers may be configured to report a grid view (as opposed to the default full view). In the grid view, the peer cluster is represented as a large SMP system and a single node with the cluster's name is reported back. For scheduling purposes, additional information may be tracked internally, but only this one pseudo-node is visible. To configure a destination Moab server to report back using the grid view, set the FLAGS attribute of the RMCFG parameter to grid.

NOTE: The grid view will be displayed if either the source or the destination server is configured with grid view.

Another type of resource configuration, a grid sandbox, is available for destination clusters. Grid sandboxes can both constrain external resource access and limit which resources are reported to other peers. See Managing Resources with Grid Sandboxes for more information.

For job information, the default behavior is to only report to the source Moab information regarding jobs that originated at the source. If information about other jobs is desired, this can be configured as shown in the Workload Submission and Control section.

See Also