Data Staging

Appendix L: Grid Data Staging Details

Moab allows sites to manage job data staging requirements so as to minimize resource inefficiencies and maximize system utilization. Without scheduler-controlled data staging, a job must handle its own data staging. This leads to inefficiencies as a job does not use its assigned compute resources while waiting for its data to be staged. Moab's data staging facilities prevent the loss of compute resources due to data blocking and can significantly improve cluster performance.

Note: Moab Workload Manager can only schedule data-staging operations if the involved resource managers are using Moab as their primary scheduler.  If this is not the case, then the capabilities described below will not be functional.

This section describes the following:

L.1 Data Staging Models

Moab supports, or plans to support, four different data staging models:

  1. Verified Data Staging (External)
  2. Prioritized Data Staging (Loose)
  3. Fully-Scheduled Data Staging (Tight)
  4. Data Staging to Allocated Nodes (Local)

The DATASTAGEMODEL parameter is used to configure Moab to use one of these models.

In each model, Moab handles data staging using a storage resource manager interface. This interface is configured using the RMCFG parameter. To actually drive the storage resource manager, a number of RM interface attributes must be set. The TYPE, RESOURCETYPE, and SYSTEMQUERYURL attributes must always be set. In addition, other attributes will be required depending on the data staging model used. Then, job submission resource managers can use this storage interface to stage data by specifying it with the DATARM attribute.

  • TYPE - must be NATIVE in all cases
  • RESOURCETYPE - must be set to STORAGE in all cases
  • SYSTEMQUERYURL - specifies method of determining file attributes such as size, ownership, etc.
  • CLUSTERQUERYURL - specifies method of determining current and configured storage manager resources such as available disk space, etc.
  • SYSTEMMODIFYURL - specifies method of initiating file creation, file deletion, and data migration

Moab is pre-packaged with several interface scripts that will work for many situations. These scripts are located in the tools directory (those beginning with *.dstage.pl) and may be customized to fit your particular needs. To use these scripts, simply define a resource manager with the needed URL attribute pointing to the appropriate script.

L.2 Verified Data Staging (External)

In this model, an external data server entity is responsible for staging needed job data. Moab has no control or influence over the timing or execution of data staging decisions. It can only determine that a job has data staging requirements and avoid starting the job until it can verify that these requirements are met. This data staging model eliminates the situation where a job is assigned resources it is unable to immediately use.

To determine when the stage-in operation is complete, Moab uses a storage resource manager SYSTEMQUERYURL interface to retrieve information about the files being staged (see below for more information). Optionally, Moab will provide diagnostic information about the storage resource manager if the CLUSTERQUERYURL interface is specified.

To take advantage of Verified Data Staging, a job must be submitted with an indication of its stage-in data requirements. The resource manager extension STAGEIN is used to indicate a job's stage-in data files. This extension can be used directly by the user or inserted via a portal or submit filter. For an example, see the TORQUE submission filter page.

Example (w/TORQUE)

moab.cfg
...
RMCFG[torque] TYPE=PBS DATARM=data

DATASTAGEMODEL EXTERNAL
RMCFG[data] TYPE=NATIVE  RESOURCETYPE=STORAGE
RMCFG[data] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage.pl
...

qsub
> qsub -W x="STAGEIN:file:///home/jsmith/big301.dat" job.cmd

1435.jupiter submitted 

Diagnostics

Moab displays information about data staging in:

Checkjob

The checkjob command reports information on both input and output data stage requests. This information includes the following:

  • stage type - input or output
  • file name - reports destination file only
  • status - pending, active, or complete
  • file size - size of file to transfer
  • data transfered - for active transfers, reports number of bytes already transferred

Example

checkjob
$ checkjob -v 412
job 412 (RM job '412.geophys.icluster')

State: Idle
Creds:  user:test2  group:test2  class:batch  qos:DEFAULT
WallTime: 00:00:00 of 00:16:40
SubmitTime: Mon Jun  6 15:11:24
  (Time Queued  Total: 00:00:56  Eligible: 00:00:39)

StageIn:  File=$HOME/data14.txt  Size=91 MB  Status=complete
...

Checknode

The checknode command will report information on storage managers' pending, active, and completed data stage requests as well as cluster resources dedicated to these requests. This information includes the following:

  • active and max storage manager data staging operations
  • dedicated and max storage manager disk usage
  • file name - reports destination file only
  • status - pending, active, or complete
  • file size - size of file to transfer
  • data transfered - for active transfers, reports number of bytes already transferred

Example

checknode
$ checknode -v storage.koa
node storage.koa

State:      Idle  (in current state for 00:01:59)
Configured Resources: DISK: 71G  dsop: 8
Utilized   Resources: DISK: 25G
Dedicated  Resources: ---
Active Data Staging Operations:  1 (limit: 8)
  job              410  complete (3091 bytes)  ($HOME/test.dat)
  job              411  complete (42 MB)  ($SCRATCH/modeldata.3)
  job              414  complete (813 MB)  ($SCRATCH/phys.john13)
  job              415  complete (16544 bytes)  ($SCRATCH/iolist.ng)
  job              419  complete (91 bytes)  ($HOME/data37.txt)
  job              422    active (37 of 83 MB)  ($SCRATCH/modeldata.4)

Dedicated Storage Manager Disk Usage:  938 of 73057 MB (Target=18264 MB)
Cluster Query URL:  exec:///$HOME/tools/dsquery.pl
Partition:  ALL  Rack/Slot:  ---
Flags:      rmdetected
RM[storage]:    TYPE=NATIVE:AGFULL

Total Time: 3:01:01:08  Up: 3:01:01:08 (100.00%)  Active: 00:00:00 (0.00%)

Reservations:  ---

...

L.3 Prioritized Data Staging (Loose)

In this model, Moab is assumed to have influence over the order in which data staging operations are executed. Moab still doesn't have full control over the staging, but is responsible for initiating the data staging operations for each job. Also, Moab assumes that the data server is unable to provide an accurate estimate of when a data migration request will be complete.

To allow Moab to initiate a data staging operation, a storage manager must be configured with the SYSTEMMODIFYURL and the SYSTEMQUERYURL attributes set. Further, if data manager throttling is desired, the CLUSTERQUERYURL attribute should be set to allow Moab to monitor data resource usage and prevent possible data cache thrashing.

If Moab detects a job with data stage-in requirements it first checks that the job's assigned resource manager has a storage manager associated with it. If this is the case and the storage manager has the SYSTEMMODIFYURL attribute set, it will attempt to stage the data by utilizing the interface defined by SYSTEMMODIFYURL. Moab will block the job until the staging operation is complete. Because this model allows Moab to explicitly request data migration actions, Moab can control when each request is made and, to some degree, have data staged according to batch system job prioritization and compute resource availability constraints. Consequently, Moab can seek to maximize the use of the data manager so as to optimize cluster performance and minimize response times for the most important jobs.

As mentioned above, if the CLUSTERQUERYURL attribute is set, Moab will monitor and control the disk usage on the storage resource manager. In addition to this attribute, the $MOABHOMEDIR/dataspaces.tab file must be created/modified to include any data space areas that you would like Moab to monitor. Multiple locations on remote nodes can be monitored for availbility and disk space. (See below example for syntax.)

Example 1: Prioritized Data Staging with Data Cache Constraints (w/TORQUE)

moab.cfg
...
RMCFG[torque] TYPE=PBS DATARM=data

DATASTAGEMODEL LOOSE
RMCFG[data] TYPE=NATIVE RESOURCETYPE=STORAGE
RMCFG[data] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage.pl
RMCFG[data] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage.pl
RMCFG[data] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage.pl
...

dataspaces.tab
# FORMAT: <protocol>://<host>/<remote_path> STATE=active

scp://head_node/home/ STATE=active
scp://data_node/cluster/users/storage/ STATE=active

qsub
> qsub -W x=STAGEIN:file:///tmp/big01.dat|file:///tmp/big02.dat,file:///home/test/ chembio.cmd

1455.jupiter submitted 

Example 2: Prioritized Data Staging with Data Cache and Transfer Agent Constraints

A given site uses a hierarchical storage manager (HSM) in conjunction with a single large SMP system. Preliminary monitoring indicates that only 25% of SMP to HSM traffic is input file based and 75% is output file based. The site also currently manages data stageback using a homegrown solution which stages data back afterdata_node/cluster/users/storage/ STATE=active job completion. Consequently, in order to free up compute resources at the earliest time possible, Moab needs to intelligently prestage the data to ensure that total data stage does not exceed 25% of total SMP disk resources.

In addition, the HSM system is known to perform best with 8 or fewer active data transfer agents. When this value is exceeded, some level of thrashing appears and performance is reduced. In the following configuration, the MAXDSOP attribute is used to prevent more than 8 simultaneous stagein requests and the TARGETUSAGE attribute prevent more than 25% of available disk resources to be consumed by input data staging requests.

moab.cfg
...
RMCFG[smp] DATARM=hsm

DATASTAGEMODEL LOOSE
RMCFG[hsm] TYPE=NATIVE RESOURCETYPE=STORAGE
RMCFG[hsm] TARGETUSAGE=80%  MAXDSOP=8
RMCFG[hsm] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage.pl
RMCFG[hsm] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage.pl
RMCFG[hsm] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage.pl
...

Example 3: Grid Data Staging

moab.cfg
...
SCHEDCFG[source] MODE=NORMAL SERVER=gridhead:5353
ADMINCFG[1] USERS=sys

RMCFG[base] TYPE=PBS

RMCFG[cluster3] SERVER=moab://gridcluster3:5353 DATARM=c3storage

DATASTAGEMODEL LOOSE
RMCFG[c3storage] TYPE=NATIVE RESOURCETYPE=STORAGE
RMCFG[c3storage] SYSTEMQUERYURL=exec://$TOOLSDIR/dstage-ssh.systemquery.pl
RMCFG[c3storage] CLUSTERQUERYURL=exec://$TOOLSDIR/dstage-ssh.clusterquery.pl
RMCFG[c3storage] SYSTEMMODIFYURL=exec://$TOOLSDIR/dstage-ssh.systemmodify.pl
...

L.4 Submitting Jobs which Request Data Staging Services

Jobs submitted directly by end users, from grid schedulers, or via application or user portals may request intelligent data staging of input files (stage-in) by using the MSTAGEIN resource manager extension. This alerts Moab that the job cannot start until the input data files are staged in by the SYSTEMMODIFYURL interface.

All jobs submitted to a resource manager that has an associated storage manager may also exploit an implicitly staging-out of standard out and standard error files. If the associated storage manager is configured with a SYSTEMMODIFYURL interface, then when the job completes successfully, the standard out and error files will be transfered automatically back to the user's home directory. This feature is especially useful for disparate clusters in a grid environment. (Note: This implicit stage-out feature is not currently available for all resource managers.)


Home Up Previous Next