Resource Manager Extensions
Moab Workload Manager®

13.3 Resource Manager Extensions

All resource managers are not created equal. There is a wide range in what capabilities are available from system to system. Additionally, there is a large body of functionality that many, if not all, resource managers have no concept of. A good example of this is job QoS. Since most resource managers do not have a concept of quality of service, they do not provide a mechanism for users to specify this information. In many cases, Moab is able to add capabilities at a global level. However, a number of features require a per job specification. Resource manager extensions allow this information to be associated with the job.

13.3.1 Resource Manager Extension Specification

Specifying resource manager extensions varies by resource manager. TORQUE, OpenPBS, PBSPro, Loadleveler, LSF, S3, and Wiki each allow the specification of an extension field as described in the following table:

Resource Manager Specification Method Example
TORQUE 2.0+ -l
qsub
> qsub -l nodes=3,qos=high sleepy.cmd
TORQUE 1.x/OpenPBS -W x=
qsub
> qsub -l nodes=3 -W x=qos:high sleepy.cmd

NOTE: OpenPBS does not support this ability by default but can be patched as described in the PBS Resource Manager Extension Overview.

Loadleveler #@comment
loadleveler command file
#@nodes = 3
#@comment = qos:high
LSF -ext
bsub
> bsub -ext advres:system.2
PBSPro -l
qsub
> qsub -l advres=system.2

NOTE: Use of PBSPro resources requires configuring the server_priv/resourcedef file to define the needed extensions as in the following example:

server_priv/resourcedef
advres type=string
qos    type=string
sid    type=string
sjid   type=string
Wiki comment
WIKI arg
comment=qos:high

13.3.2 Resource Manager Extension Values

Using the resource manager specific method, the following job extensions are currently available:

ADVRES BANDWIDTH DDISK DEADLINE DEPEND DMEM FEATURE GATTR
GEOMETRY GMETRIC GRES HOSTLIST JGROUP JOBFLAGS LATENCY LOGLEVEL
MAXMEM MAXPROC MINPREEMPTTIME MINPROCSPEED MINWCLIMIT MSTAGEIN MSTAGEOUT NACCESSPOLICY
NALLOCPOLICY NODESCALING NODESET OPSYS PARTITION PREF QOS QUEUEJOB
REQATTR RESFAILPOLICY RMTYPE SID SIGNAL SOFTWARE SPRIORITY TASKDISTPOLICY
TEMPLATE TERMTIME TPN TRIG TRL TTC VAR
                   

NAME

ADVRES
FORMAT[<RSVID>]
DEFAULT VALUE---
DESCRIPTIONSpecifies that reserved resources are required to run the job. If <RSVID> is specified, then only resources within the specified reservation may be allocated. (See Job to Reservation Binding.)
EXAMPLE
qsub
> qsub -l advres=grid.3

NAME

BANDWIDTH
FORMAT<DOUBLE> (in MB/s)
DEFAULT VALUE---
DESCRIPTIONMinimum available network bandwidth across allocated resources. (See Network Management.)
EXAMPLE
bsub
> bsub -ext bandwidth=120 chemjob.txt

NAME

DDISK
FORMAT<INTEGER>
DEFAULT VALUE0
DESCRIPTIONDedicated disk per task in MB.
EXAMPLE
TORQUE qsub
qsub -l ddisk=2000

NAME

DEADLINE
FORMAT[[[DD:]HH:]MM:]SS
DEFAULT VALUE---
DESCRIPTIONRelative completion deadline of job (from job submission time).
EXAMPLE
TORQUE qsub
> qsub -l deadline=2:00:00,nodes=4 /tmp/bio3.cmd

NAME

DEPEND
FORMAT[<DEPENDTYPE>:][{jobname|jobid}.]<ID>[:[{jobname|jobid}.]<ID>]...
DEFAULT VALUE---
DESCRIPTIONAllows specification of job dependencies for compute or system jobs. If no ID prefix (jobname or jobid) is specified, the ID value is interpreted as a job ID.
EXAMPLE
Moab msub
# submit job which will run after job 1301 and 1304 complete
> msub -l depend=orion.1301:orion.1304 test.cmd

orion.1322

# submit prereq job
> msub -N data1005 prestage.cmd

orion.1427

# submit jobname-based dependency job
> msub -l depend=jobname.data1005 dataetl.cmd

orion.1428

NAME

DMEM
FORMAT<INTEGER>
DEFAULT VALUE0
DESCRIPTIONDedicated memory per task in MB.
EXAMPLEDMEM:512

NAME

FEATURE
FORMAT<FEATURE>[{:|}<FEATURE>]...
DEFAULT VALUE---
DESCRIPTIONRequired list of node attribute/node features. NOTE: If the pipe (|) character is used as a delimiter, the features are logically OR'd together and the associated job may use resources that match any of the specified features.
EXAMPLE
qsub (TORQUE 2.2+)
> qsub -l feature='fastos:bigio' testjob.cmd

NAME

GATTR
FORMAT<STRING>
DEFAULT VALUE---
DESCRIPTIONGeneric job attribute associated with job.
EXAMPLE
qsub
> qsub -l gattr=bigjob

NAME

GEOMETRY
FORMAT{ <TASKID>[,<TASKID>]... }[,{ <TASKID>[,<TASKID>]... }]...
DEFAULT VALUE---
DESCRIPTIONExplicitly specified task geometry.
EXAMPLE
qsub
> qsub -l nodes=2:ppn=4 -W x=geometry:'{0,1,4,5},{2,3,6,7}' quanta2.cmd

NAME

GMETRIC
FORMATgeneric metric requirement for allocated nodes where the requirement is specified using the format <GMNAME>[:{lt,le,eq,ge,gt,ne}<VALUE>]
DEFAULT VALUE---
DESCRIPTIONIndicates generic constraints that must be found on all allocated nodes. If a <VALUE> is not specified, the node must simply possess the generic metric. (See Generic Metrics for more information.)
EXAMPLE
TORQUE qsub
> qsub -l gmetric=bioversion:ge:133244 testj.txt

NAME

GRES and SOFTWARE
FORMATcomma delimited list of generic resources where each resource is specified using the format <RESTYPE>[{+|:}<COUNT>][@<TIMEFRAME>]
DEFAULT VALUE---
DESCRIPTIONIndicates generic resources required by the job on a per task basis. If a <COUNT> is not specified, the resource count defaults to 1. If the <TIMEFRAME> is specified, the generic resource is consumed from the start of the job until <TIMEFRAME> expires; otherwise the resource is consumed during the entire life of the job.
EXAMPLE
TORQUE qsub with -W
> qsub -W x=GRES:tape+2,matlab+3@2:00 testj.txt

NOTE: When specifying more than 1 generic resource with -l the '%' character must be used to deliminate them.

TORQUE qsub with -l
> qsub -l gres=tape+2%matlab+3 testj.txt
TORQUE qsub with -l
> qsub -l software=matlab:2 testj.txt

NAME

HOSTLIST
FORMAT'+' delimited list of hostnames
DEFAULT VALUE---
DESCRIPTIONIndicates an exact set, superset, or subset of nodes on which the job must run. NOTE: Use the carot (^) or asterisk (*) characters to specify a host list as superset or subset respectively.
EXAMPLE
msub
> msub -l hostlist=nodeA+nodeB+nodeE

NAME

JGROUP
FORMAT<JOBGROUPID>
DEFAULT VALUE---
DESCRIPTIONID of job group to which this job belongs (different from the GID of the user running the job).
EXAMPLEJGROUP:bluegroup

NAME

JOBFLAGS (aka FLAGS)
FORMATone or more of the following colon delimited job flags including ADVRES[:RSVID], NOQUEUE, NORMSTART, PREEMPTEE, PREEMPTOR, RESTARTABLE, SUSPENDABLE or COALLOC (see job flag overview for a complete listing)
DEFAULT VALUE---
DESCRIPTIONAssociates various flags with the job.
EXAMPLE
TORQUE qsub > qsub -l nodes=1,walltime=3600,jobflags=advres tt.3

NAME

LATENCY
FORMAT<DOUBLE> (in microseconds)
DEFAULT VALUE---
DESCRIPTIONMaximum average network latency across allocated resources. (See Network Management.)
EXAMPLE
TORQUE qsub
> qsub -l latency=2.5 hibw.cmd

NAME

LOGLEVEL
FORMAT<INTEGER>
DEFAULT VALUE---
DESCRIPTIONPer job log verbosity.
EXAMPLE
TORQUE qsub
> qsub -l -W x=loglevel:5 bw.cmd
Job events and analysis will be logged with level 5 verbosity.

NAME

MAXMEM
FORMAT>INTEGER< (in megabytes)
DEFAULT VALUE---
DESCRIPTIONMaximum amount of memory the job may consume across all tasks before the JOBMEM action is taken.
EXAMPLE
TORQUE qsub
> qsub -W x=MAXMEM:1000mb bw.cmd
If a RESOURCELIMITPOLICY is set for per-job memory utilization, its action will be taken when this value is reached.

NAME

MAXPROC
FORMAT>INTEGER<
DEFAULT VALUE---
DESCRIPTIONMaximum CPU load the job may consume across all tasks before the JOBMEM action is taken.
EXAMPLE
TORQUE qsub
> qsub -W x=MAXPROC:4 bw.cmd
If a RESOURCELIMITPOLICY is set for per-job processor utilization, its action will be taken when this value is reached.

NAME

MINPREEMPTTIME
FORMAT[[DD:]HH:]MM:]SS
DEFAULT VALUE---
DESCRIPTIONMinimum time job must run before being eligible for preemption.

NOTE: Can only be specified if associated QoS allows per-job preemption configuration by setting the preemptconfig flag.
EXAMPLE
TORQUE qsub
> qsub -l minpreempttime=900 bw.cmd
Job cannot be preempted until it has run for 15 minutes.

NAME

MINPROCSPEED
FORMAT<INTEGER>
DEFAULT VALUE0
DESCRIPTIONMinimum processor speed (in MHz) for every node that this job will run on.
EXAMPLE
TORQUE qsub
> qsub -W x=MINPROCSPEED:2000 bw.cmd
Every node that runs this job must have a processor speed of at least 2000 MHz.

NAME

MINWCLIMIT
FORMAT[[DD:]HH:]MM:]SS
DEFAULT VALUE1:00:00
DESCRIPTIONMinimum wallclock limit job must run before being eligible for extension. (See JOBEXTENDDURATION.)
EXAMPLE
TORQUE qsub
> qsub -l minwclimit=300,walltime=16000 bw.cmd
Job will run for at least 300 seconds but up to 16,000 seconds if possible (without interfering with other jobs).

NAME

MSTAGEIN
FORMAT[<SRCURL>[|<SRCRUL>...],]<DSTURL>
DEFAULT VALUE---
DESCRIPTIONIndicates whether a job has data staging requirements. If more than one source URL is specified, the destination URL must be a directory.

The format of <SRCURL> is:[PROTO://][HOST][:PORT]][/PATH] where the path is local.

The format of <DSTURL> is:[PROTO://][HOST][:PORT]][/PATH] where the path is remote.

PROTO can be any of the following protocols: ssh, file, or gsiftp.
HOST is the name of the host where the file resides.
PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/).

Valid variables include:
$JOBID
$HOME
$SUBMITHOST
$DEST
$LOCALDATASTAGEHEAD

NOTE: If no destination is given, the protocol and file name will be set to the same as the source.

EXAMPLE
msub
> msub -W x='mstagein=file://$HOME/test1.sh|file:///home/dev/test2.sh,ssh://host/home/dev/' script.sh
Copy test1.sh and test2.sh from the local machine to /home/dev/ on host.

NAME

MSTAGEOUT
FORMAT[<SRCURL>[|<SRCRUL>...],]<DSTURL>
DEFAULT VALUE---
DESCRIPTIONIndicates whether a job has data staging requirements. If more than one source URL is specified, the destination URL must be a directory.

The format of <SRCURL> is:[PROTO://][HOST][:PORT]][/PATH] where the path is remote.

The format of <DSTURL> is:[PROTO://][HOST][:PORT]][/PATH] where the path is local.

PROTO can be any of the following protocols: ssh, file, or gsiftp.
HOST is the name of the host where the file resides.
PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/).

Valid variables include:
$JOBID
$HOME
$SUBMITHOST
$DEST
$LOCALDATASTAGEHEAD

NOTE: If no destination is given, the protocol and file name will be set to the same as the source.

EXAMPLE
msub
> msub -W x='mstageout=ssh://$DEST/$HOME/test1.sh|ssh://host/home/dev/test2.sh,ssh:///home/dev/' script.sh
Copy test1.sh and test2.sh from the remote machine, host, to /home/dev/ on the local machine.

NAME

NACCESSPOLICY
FORMATone of SHARED, SINGLEJOB, SINGLETASK, SINGLEUSER, or UNIQUEUSER
DEFAULT VALUE---
DESCRIPTIONSpecifies how node resources should be accessed. (See Node Access Policies for more information).

NOTE: The naccesspolicy option can only be used to make node access more constraining than is specified by the system, partition, or node policies. If the effective node access policy is shared, naccesspolicy can be set to singleuser, if the effective node access policy is singlejob, naccesspolicy can be set to singletask.
EXAMPLE
TORQUE qsub
> qsub -l naccesspolicy=singleuser bw.cmd

LSF bsub
> bsub -ext naccesspolicy=singleuser lancer.cmd
Job cannot only allocate free nodes or nodes running jobs by same user.

NAME

NALLOCPOLICY
FORMATone of the valid settings for the parameter NODEALLOCATIONPOLICY
DEFAULT VALUE---
DESCRIPTIONSpecifies how node resources should be selected and allocated to the job. (See Node Allocation Policies for more information.)
EXAMPLE
TORQUE qsub
> qsub -l nallocpolicy=minresource bw.cmd
Job should use the minresource node allocation policy.

NAME

NODESCALING
FORMAT<BOOLEAN>
DEFAULT VALUEFALSE
DESCRIPTIONSpecifies that the requested nodecount should be treated as a requested node equivalency amount.
EXAMPLE
TORQUE qsub
> qsub -l nodes=2000 -W x=NODESCALING:TRUE
Job will run on the equivalent of 2000 nodes (meaning the job may run on fewer nodes if the nodes are faster and vice versa).

NAME

NODESET
FORMAT<SETTYPE>:<SETATTR>[:<SETLIST>]
DEFAULT VALUE---
DESCRIPTIONSpecifies nodeset constraints for job resource allocation. (See the NodeSet Overview for more information.)
EXAMPLE
TORQUE qsub
> qsub -l nodeset=ONEOF:PROCSPEED:350:400:450 bw.cmd

NAME

OPSYS
FORMAT<OperatingSytem>
DEFAULT VALUE---
DESCRIPTIONSpecifies the job's required operating system.
EXAMPLE
qsub
> qsub -l nodes=1,opsys=rh73 chem92.cmd

NAME

PARTITION
FORMAT<STRING>[{,|:}<STRING>]...
DEFAULT VALUE---
DESCRIPTIONSpecifies the partition (or partitions) in which the job must run.

NOTE: The job must have access to this partition based on system wide or credential based partition access lists.
EXAMPLE
qsub
> qsub -l nodes=1,partition=math:geology
The job must only run in the math partition or the geology partition.

NAME

PREF
FORMAT<STRING>[:<STRING>]...
DEFAULT VALUE---
DESCRIPTIONSpecifies which node features are preferred by the job and should be allocated if available. If preferred node criteria are specified, Moab favors the allocation of matching resources but is not bound to only consider these resources.

NOTE: Preferences are not honored unless the node allocation policy is set to PRIORITY and the PREF priority component is set within the node's PRIORITYF attribute.
EXAMPLE
qsub
> qsub -l nodes=1,pref=bigmem

The job may run on any nodes but prefers to allocate nodes with the bigmem feature.

NAME

QoS
FORMAT<STRING>
DEFAULT VALUE---
DESCRIPTION 
EXAMPLE
qsub
> qsub -l walltime=1000,qos=highprio biojob.cmd

NAME

QUEUEJOB
FORMAT

<BOOLEAN>

DEFAULT VALUETRUE
DESCRIPTION Indicates whether or not the scheduler should queue the job if resources are not available to run the job immediately
EXAMPLEQUEUEJOB:FALSE

NAME

REQATTR
FORMATRequired node attributes with version number support: <ATTRIBUTE>[{>=|>|<=|<|=}<VERSION>]
DEFAULT VALUE---
DESCRIPTIONIndicates required node attributes.
EXAMPLE
TORQUE qsub with -l
> qsub -l reqattr=matlab=7.1 testj.txt

NAME

RESFAILPOLICY
FORMATone of CANCEL, HOLD, IGNORE, NOTIFY, or REQUEUE
DEFAULT VALUE---
DESCRIPTIONSpecifies the action to take on an executing job if one or more allocated nodes fail. This setting overrides the global value specified with the NODEALLOCRESFAILUREPOLICY parameter.
EXAMPLE
RESFAILPOLICY
resfailpolicy=ignore
For this particular job, ignore node failures.

NAME

RMTYPE
FORMAT<STRING>
DEFAULT VALUE---
DESCRIPTIONOne of the resource manager types currently available within the cluster or grid. Typically, this is one of PBS, LSF, LL, SGE, SLURM, BProc, Condor, and so forth.
EXAMPLE
rmtype
rmtype=ll
Only run job on a Loadleveler destination resource manager.

NAME

SID
FORMAT<STRING>
DEFAULT VALUE---
DESCRIPTION 
EXAMPLE
sid
SID:silverA

NAME

SIGNAL
FORMAT<INTEGER>[@<OFFSET>]
DEFAULT VALUE---
DESCRIPTIONSpecifies the pre-termination signal to be sent to a job prior to it reaching its walltime limit or being terminated by Moab. The optional offset value specifies how long before job termination the signal should be sent. By default, the pre-termination signal is sent one minute before a job is terminated
EXAMPLE
msub
> msub -l signal=32@120 bio45.cmd

NAME

SPRIORITY
FORMAT<INTEGER>
DEFAULT VALUE0
DESCRIPTIONAllows Moab administrators to set a system priority on a job. (similar to setspri)
EXAMPLE
spriority
> qsub -l nodes=16,spriority=100 job.cmd

NAME

TASKDISTPOLICY
FORMATone of RR, or PACK
DEFAULT VALUE---
DESCRIPTIONAllows users to specify task distribution policies on a per job basis. (See Task Distribution Overview)
EXAMPLE
taskdistpolicy
> qsub -l nodes=16,taskdistpolicy=rr job.cmd

NAME

TEMPLATE
FORMAT<STRING>
DEFAULT VALUE---
DESCRIPTIONSpecifies a job template to be used as a set template. (See Job Templates.)
EXAMPLE
msub
> msub -l walltime=1000,nodes=16,template=biojob job.cmd

NAME

TERMTIME
FORMAT<TIMESPEC>
DEFAULT VALUE0
DESCRIPTIONSpecifies the time at which Moab should cancel a queued or active job. (See Job Deadline Support.)
EXAMPLE
msub
> msub -l nodes=10,walltime=600,termtime=12:00_Jun/14 job.cmd

NAME

TPN
FORMAT<INTEGER>[+]
DEFAULT VALUE0
DESCRIPTIONTasks per node allowed on allocated hosts. If the plus (+) character is specified, the tasks per node value is interpreted as a minimum tasks per node constraint; otherwise it is interpreted as an exact tasks per node constraint.

NOTE on Differences between TPN and PPN:

There are two key differences between the following: (A) qsub -l nodes=12:ppn=3 and (B) qsub -l nodes=12,tpn=3

The first difference is that ppn is interpreted as the minimum required tasks per node while tpn defaults to exact tasks per node; case (B) executes the job with exactly 3 tasks on each allocated node while case (A) executes the job with at least 3 tasks on each allocated node—nodeA:4,nodeB:3,nodeC:5

The second major difference is that the line, nodes=X:ppn=Y actually requests X*Y tasks, whereas nodes=X,tpn=Y requests only X tasks.

EXAMPLE
msub
> msub -l nodes=10,walltime=600,tpn=4 job.cmd

NAME

TRIG
FORMAT<TRIGSPEC>
DEFAULT VALUE---
DESCRIPTIONAdds trigger(s) to the job. (See the Trigger Specification Page for specific syntax.)

NOTE: Job triggers can only be specified if allowed by the QoS flag trigger.
EXAMPLE
msub
> qsub -l trig=start:exec@/tmp/email.sh job.cmd

NAME

TRL (Format 1)
FORMAT<INTEGER>[@<INTEGER>][:<INTEGER>[@<INTEGER>]]...
DEFAULT VALUE0
DESCRIPTIONSpecifies alternate task requests with their optional walltimes. (See Malleable Jobs.)
EXAMPLE
msub
> msub -l trl=2@500:4@250:8@125:16@62 job.cmd

or
qsub > qsub -l trl=2:3:4

NAME

TRL (Format 2)
FORMAT<INTEGER>-<INTEGER>
DEFAULT VALUE0
DESCRIPTIONSpecifies a range of task requests that require the same walltime. (See Malleable Jobs.)
EXAMPLE
msub
> msub -l trl=32-64 job.cmd
NOTE: For optimization purposes Moab does not perform an exhaustive search of all possible values but will at least do the beginning, the end, and 4 equally distributed choices in between.

NAME

TTC
FORMAT<INTEGER>
DEFAULT VALUE0
DESCRIPTIONTotal tasks allowed across the number of hosts requested. TTC is supported in the Wiki resource manager for SLURM. Compressed output must be enabled in the moab.cfg file. (See SLURMFLAGS for more information). NODEACCESSPOLICY should be set to SINGLEJOB and JOBNODEMATCHPOLICY should be set to EXACTNODE in the moab.cfg file.
EXAMPLE
msub
> msub -l nodes=10,walltime=600,ttc=20 job.cmd
NOTE: In this example, assuming all the nodes are 8 processor nodes, the first allocated node will have 10 tasks, the next node will have 2 tasks, and the remaining 8 nodes will have 1 task each for a total task count of 20 tasks.

NAME

VAR
FORMAT<ATTR>:<VALUE>
DEFAULT VALUE---
DESCRIPTIONAdds a generic variable to the job.
EXAMPLE VAR:applicationtype:blast


13.3.3 Resource Manager Extension Examples

If more than one extension is required in a given job, extensions can be concatenated with a semicolon separator using the format <ATTR>:<VALUE>[;<ATTR>:<VALUE>]...

Example 1

Loadleveler command file
#@comment="HOSTLIST:node1,node2;QOS:special;SID:silverA"

Job must run on nodes node1 and node2 using the QoS special. The job is also associated with the system ID silverA allowing the silver daemon to monitor and control the job.

Example 2

PBS command file
# PBS -W x=\"NODESET:ONEOF:NETWORK;DMEM:64\"

Job will have resources allocated subject to network based nodeset constraints. Further, each task will dedicate 64 MB of memory.

Example 3

moab.cfg
#  qsub -l nodes=4,walltime=1:00:00 -W x="FLAGS:ADVRES:john.1"

Job will be forced to run within the john.1 reservation.

See Also