All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text. Moab is configured to communicate via the wiki interface by specifying the following parameters in the moab.cfg file:
Field values must backslash escape the following characters if specified:
Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all nodes. Specify a colon delimited list of NODEID's if specific nodes are desired or use the keyword 'ALL' to receive information for all nodes.
W.1.1.1.2 Wiki Resource Query Response Format
The query resources response format is one or more line of the following format (separated with a newline, "\n"):
<NODEID> <ATTR>=<VALUE>[;<ATTR>=<VALUE>]...
<ATTR> is one of the names in the table below and the format of <VALUE> is dependent on <ATTR>.
W.1.1.1.3 Wiki Query Resources Example
request:
response:
W.1.1.1.4 Wiki Query Resources Data Format
NAME
FORMAT
DEFAULT
DESCRIPTION
ACLASS
one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3])
---
run classes currently available on node. If not specified, scheduler
will attempt to determine actual ACLASS value.
one or more colon delimited <STRING>'s (ie, ETHER:ATM)
---
Available network interfaces on node. Available interfaces are
those which are 'up' and not already dedicated to a job.
APROC
<INTEGER>
1
available processors on node
ARCH
<STRING>
---
compute architecture of node
ARES
one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100)
---
Arbitrary consumable resources currently available on the node
ASWAP
<INTEGER>
0
available swap on node (in MB)
CCLASS
one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3])
---
Run classes supported by node. Typically, one class is 'consumed'
per task. Thus, an 8 processor node may have 8 instances of each
class it supports present, ie [batch:8][interactive:8]
CDISK
<INTEGER>
0
configured local disk on node (in MB)
CFS
<STRING>
0
configured filesystem state
CMEMORY
<INTEGER>
0
configured RAM on node (in MB)
CNET
one or more colon delimited <STRING>'s (ie, ETHER:FDDI:ATM)
---
configured network interfaces on node
CPROC
<INTEGER>
1
configured processors on node
CPULOAD
<DOUBLE>
0.0
one minute BSD load average
CRES
one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100)
---
Arbitrary consumable resources supported and tracked on the node, ie
software licenses or tape drives.
CSWAP
<INTEGER>
0
configured swap on node (in MB)
CURRENTTASK
<INTEGER>
0
Number of tasks currently active on the node
EVENT
<STRING>
---
Event or exception which occurred on the node
FEATURE
one or more colon delimited <STRING>'s (ie, WIDE:HSM)
---
generic attributes, often describing hardware or software features,
associated with the node.
GCOUNTER
<INTEGER>
---
current total number of gevent event occurrences since epoch. This value should be monotonically increasing.
Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all jobs. Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword 'ALL' to receive information about all jobs.
quantity of local disk space (in MB) which must be dedicated to each
task of the job
DPROCS
<INTEGER>
1
number of processors dedicated per task
DNETWORK
<STRING>
---
network adapter which must be dedicated to job
DSWAP
<INTEGER>
0
quantity of virtual memory (swap, in MB) which must be dedicated to
each task of the job
ENDDATE
<EPOCHTIME>
[ANY]
time by which job must complete
ENV
<STRING>
---
job environment variables
EVENT
<EVENT>
---
event or exception experienced by job
ERROR
<STRING>
---
file to contain STDERR
EXEC
<STRING>
---
job executable command
EXITCODE
<INTEGER>
---
job exit code
FLAGS
<STRING>
---
job flags
GEOMETRY
<STRING>
---
String describing task geometry required by job
GNAME*
<STRING>
---
GroupID under which job will run
HOSTLIST
comma or colon delimited list of hostnames -
suffix the hostlist with a carat (^) to mean superset; suffix with an asterisk (*) to mean subset; otherwise, the hostlist is interpreted as an exact set
[ANY]
list of required hosts on which job must run. (see TASKLIST)
INPUT
<STRING>
---
file containing STDIN
IWD
<STRING>
---
job's initial working directory
NAME
<STRING>
---
User specified name of job
NODERANGE
<INTEGER>[,<INTEGER>]
---
Minimum and maximum nodes allowed to be allocated to job. Used for dynamic jobs.
NODES
<INTEGER>
1
Number of nodes required by job (See Node Definition for more info)
OUTPUT
<STRING>
---
file to contain STDOUT
PARTITIONMASK
one or more colon delimited <STRING>s
[ANY]
list of partitions in which job can run
PREF
colon delimited list <STRING>'s
---
List of preferred node features
PRIORITY
<INTEGER>
---
system priority (absolute or relative - use '+' and '-' to specify relative)
QOS
<INTEGER>
0
quality of service requested
QUEUETIME*
<EPOCHTIME>
0
time job was submitted to resource manager
RARCH
<STRING>
---
architecture required by job
RCLASS
list of bracket enclosed <STRING>:<INTEGER> pairs
---
list of <CLASSNAME>:<COUNT> pairs indicating type and number
of class instances required per task. (ie, '[batch:1]' or '[batch:2][tape:1]')
RDISK
<INTEGER>
0
local disk space (in MB) required to be configured on nodes allocated
to the job
RDISKCMP
one of '>=', '>', '==', '<', or '<='
>=
local disk comparison (ie, node must have > 2048 MB local disk)
REJCODE
<INTEGER>
0
reason job was rejected
REJCOUNT
<INTEGER>
0
number of times job was rejected
REJMESSAGE
<STRING>
---
text description of reason job was rejected
REQRSV
<STRING>
---
Name of reservation in which job must run
RESACCESS
<STRING>
---
List of reservations in which job can run
RFEATURES
colon delimited list <STRING>'s
---
List of features required on nodes
RMEM
<INTEGER>
0
real memory (RAM, in MB) required to be configured on nodes allocated
to the job
RMEMCMP
one of '>=', '>', '==', '<', or '<='
>=
real memory comparison (ie, node must have >= 512MB RAM)
RNETWORK
<STRING>
---
network adapter required by job
ROPSYS
<STRING>
---
operating system required by job
RSWAP
<INTEGER>
0
virtual memory (swap, in MB) required to be configured on nodes allocated
to the job
RSWAPCMP
one of '>=', '>', '==', '<', or '<='
>=
virtual memory comparison (ie, node must have ==4096 MB virtual memory)
SID
<STRING>
---
system id (global job system owner)
SJID
<STRING>
---
system job id (global job id)
STARTDATE
<EPOCHTIME>
0
earliest time job should be allowed to start
STARTTIME*
<EPOCHTIME>
0
time job was started by the resource manager
STATE*
one of Idle, Running, Hold, Suspended, Completed, or Removed
Idle
State of job
SUSPENDTIME
<INTEGER>
0
Number of seconds job has been suspended
TARGETBACKLOG
<DOUBLE>[,<DOUBLE>]
---
Minimum and maximum backlog for application within job. In the case of dynamic jobs, Moab allocates/deallocates resources as needed to keep the job within the target range.
TARGETLOAD
<DOUBLE>[,<DOUBLE>]
---
Minimum and maximum load for application within job. In the case of dynamic jobs, Moab allocates/deallocates resources as needed to keep the job within the target range.
TARGETRESPONSETIME
<DOUBLE>[,<DOUBLE>]
---
Minimum and maximum response time for application within job. In the case of dynamic jobs, Moab allocates/deallocates resources as needed to keep the job within the target range.
TARGETTHROUGHPUT
<DOUBLE>[,<DOUBLE>]
---
Minimum and maximum throughput for application within job. In the case of dynamic jobs, Moab allocates/deallocates resources as needed to keep the job within the target range.
TARGETVIOLATIONTIME
<ALLOCATIONTIME>[,<DEALLOCATIONTIME>] where values are specified using the format [[[DD:]HH:]MM:]SS
---
Amount of time an application performance target must be exceeded before Moab adjusts the resource allocation of a dynamic job. By default, Moab allocates/deallocates resources as soon as a performance target violation is detected.
TASKLIST
one or more comma-delimited <STRING>'s
---
list of allocated tasks, or in other words, comma-delimited list of node ID's associated with each active task of job (i.e., cl01, cl02,
cl01, cl02, cl03) The tasklist is initially selected by the scheduler
at the time the StartJob command is issued. The resource manager
is then responsible for starting the job on these nodes and maintaining
this task distribution information throughout the life of the job.
(see HOSTLIST)
TASKS*
<INTEGER>
1
Number of tasks required by job (See Task Definition for more info)
TASKPERNODE
<INTEGER>
0
exact number of tasks required per node
UNAME*
<STRING>
---
UserID under which job will run
UPDATETIME*
<EPOCHTIME>
0
Time job was last updated
WCLIMIT*
[[HH:]MM:]SS
864000
walltime required by job
* indicates required field
NOTE: Job states have the following definitions:
Completed:
Job has completed
Hold:
Job is in the queue but is not allowed to run
Idle:
Job is ready to run
Removed:
Job has been canceled or otherwise terminated externally
Running:
Job is currently executing
Suspended:
job has started but execution has temporarily been suspended
NOTE: Completed and canceled jobs should
be maintained by the resource manager for a brief time, perhaps 1 to 5
minutes, before being purged. This provides the scheduler time to
obtain all final job state information for scheduler statistics.
1.1.3 StartJob
The 'StartJob' command may only be applied to jobs
in the 'Idle' state. It causes the job to begin running using the
resources listed in the NodeID list.
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message possibly further describing an error or state
1.1.4 CancelJob
The 'CancelJob' command, if applied to an active job, will terminate its execution. If applied to an idle or active job, the CancelJob command will change the job's state to 'Canceled'.
send CMD=CANCELJOB ARG=<JOBID>
TYPE=<CANCELTYPE>
<CANCELTYPE> is one of the following:
ADMIN
(command initiated by scheduler administrator)
WALLCLOCK (command initiated by scheduler because
job exceeded its specified wallclock limit)
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
1.1.5 SuspendJob
The 'SuspendJob' command can only be issued against a job in the state 'Running'. This command suspends job execution and results in the job changing to the 'Suspended' state.
send CMD=SUSPENDJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message possibly further describing an error or state
1.1.6 ResumeJob
The 'ResumeJob' command can only be issued against
a job in the state 'Suspended'. This command resumes a suspended
job returning it to the 'Running' state.
send CMD=RESUMEJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
1.1.7 RequeueJob
The 'RequeueJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command requeues the job, stopping execution and returning the job to an idle state in the queue. The requeued job will be eligible for execution the next time resources are available.
send CMD=REQUEUEJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
1.1.8 SignalJob
The 'SignalJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command signals the job, sending the specified signal to the master process. The signalled job will be remain in the same state it was before the signal was issued.