[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive]

Wiki Interface Specification, version 1.1


COMMANDS:

    All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text.  Maui is configured to communicate via the wiki interface by specifying the following parameters in the maui.cfg file:

    RMTYPE[X]        WIKI
    RMSERVER[X]  <HOSTNAME>
    RMPORT[X]       <PORTNUMBER>

    Field values must backslash escape the following characters if specified:

        '#'  ';'  ':'      (ie  '\#')

    Supported Commands are:

        GETNODES, GETJOBS, STARTJOB, CANCELJOB, SUSPENDJOB, RESUMEJOB, JOBADDTASK, JOBRELEASETASK



GetNodes

    send

        CMD=GETNODES ARG={<UPDATETIME>:<NODEID>[:<NODEID>]... | <UPDATETIME>:ALL}

        Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest.  Setting <UPDATETIME> to '0' will return information for all nodes.  Specify a colon delimited list of NODEID's if specific nodes are desired or use the keyword 'ALL' to receive information for all nodes.

    receive

        SC=<STATUSCODE> ARG=<NODECOUNT>#<NODEID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<NODEID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...

        or

        SC=<STATUSCODE> RESPONSE=<RESPONSE>

    STATUSCODE Values:

        0     SUCCESS
      -1     INTERNAL ERROR

    FIELD             is either the text name listed below or 'A<FIELDNUM>' (ie, 'UPDATETIME' or 'A2')

    RESPONSE   is a statuscode sensitive message describing error or state details

    EXAMPLE:

        send 'CMD=GETNODES ARG=0:node001:node002:node003'

        receive 'SC=0 ARG=4#node001:UPDATETIME=963004212;STATE=Busy;OS=AIX43;ARCH=RS6000...'

Field Values
 
INDEX NAME FORMAT DEFAULT DESCRIPTION
1 UPDATETIME* <EPOCHTIME> 0 time node information was last updated
2 STATE* one of the following: Idle, Running, Busy, Unknown,Draining, or Down Down state of the node
3 OS <STRING> [NONE] operating system running on node
4 ARCH <STRING> [NONE] compute architecture of node
5 CMEMORY <INTEGER> 0 configured RAM on node (in MB)
6 AMEMORY <INTEGER> 0 available/free RAM on node (in MB)
7 CSWAP <INTEGER> 0 configured swap on node (in MB)
8 ASWAP <INTEGER> 0 available swap on node (in MB)
9 CDISK <INTEGER> 0 configured local disk on node (in MB)
10 ADISK <INTEGER> 0 available local disk on node (in MB)
11 CPROC <INTEGER> 1 configured processors on node
12 APROC <INTEGER> 1 available processors on node
13 CNET one or more colon delimited <STRING>'s (ie, ETHER:FDDI:ATM) [NONE] configured network interfaces on node
14 ANET one or more colon delimited <STRING>'s (ie, ETHER:ATM) [NONE] Available network interfaces on node.  Available interfaces are those which are 'up' and not already dedicated to a job.
15 CPULOAD <DOUBLE> 0.0 one minute BSD load average
16 CCLASS one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3]) [NONE] Run classes supported by node.  Typically, one class is 'consumed' per task.  Thus, an 8 processor node may have 8 instances of each class it supports present, ie [batch:8][interactive:8]
17 ACLASS one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3]) [NONE] run classes currently available on node.  If not specified, scheduler will attempt to determine actual ACLASS value.
18 FEATURE one or more colon delimited <STRING>'s (ie, WIDE:HSM) [NONE] generic attributes, often describing hardware or software features, associated with the node.
19 PARTITION <STRING> DEFAULT partition to which node belongs
20 EVENT <STRING> [NONE] Event or exception which occurred on the node
21 CURRENTTASK <INTEGER> 0 Number of tasks currently active on the node
22 MAXTASK <INTEGER> <CPROC> Maximum number of tasks allowed on the node at any given time
23 SPEED <DOUBLE> 1.0 Relative processor speed of the node
24 FRAME <INTEGER> 0 Frame location of the node
25 SLOT <INTEGER> 0 Slot location of the node
26 CRES one or more colon delimited <NAME>,<VALUE> pairs (ie, MATLAB,6:COMPILER,100) [NONE] Arbitrary consumable resources supported and tracked on the node, ie software licenses or tape drives.
27 ARES one or more colon delimited <NAME>,<VALUE> pairs (ie, MATLAB,6:COMPILER,100) [NONE] Arbitrary consumable resources currently available on the node

* indicates required field

NOTE 1:  node states have the following definitions:
    Idle:                Node is ready to run jobs but currently is not running any.
    Running:       Node is running some jobs and will accept additional jobs
    Busy:              Node is running some jobs and will not accept additional jobs
    Unknown:     Node is capable of running jobs but the scheduler will need to determine if the node state is actually Idle, Running, or Busy.
    Draining:      Node is responding but will not accept new jobs
    Down:            Resource Manager problems have been detected.  Node is incapable of running jobs.



GetJobs

    send

        CMD=GETJOBS ARG={<UPDATETIME>:<JOBID>[:<JOBID>]... | <UPDATETIME>:ALL }

        Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest.  Setting <UPDATETIME> to '0' will return information for all jobs.  Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword 'ALL' to receive information about all jobs

    receive

        SC=<STATUSCODE> ARG=<JOBCOUNT>#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...

        or

        SC=<STATUSCODE> RESPONSE=<RESPONSE>
 
        FIELD      is either the text name listed below or 'A<FIELDNUM>'
                      (ie, 'UPDATETIME' or 'A2')

        STATUSCODE values:

             0    SUCCESS
            -1   INTERNAL ERROR

        RESPONSE   is a statuscode sensitive message describing error or state details

        EXAMPLE:

             send 'CMD=GETJOBS ARG=0:LL'

             receive 'ARG=2#nebo3001.0:UPDATETIME=9780000320;STATE=Idle;WCLIMIT=3600;...'

 Table of Job Field Values
INDEX NAME FORMAT DEFAULT DESCRIPTION
1 UPDATETIME* <EPOCHTIME> 0 Time job was last updated
2 STATE* one of Idle, Running, Hold, Suspended, Completed, or Cancelled Idle State of job
3 WCLIMIT* <INTEGER> 864000 Seconds of wall time required by job
4 TASKS* <INTEGER> 1 Number of tasks required by job
5 NODES <INTEGER> 1 Number of nodes required by job
6 GEOMETRY <STRING> [NONE] String describing task geometry required by job
7 QUEUETIME* <EPOCHTIME> 0 time job was submitted to resource manager
8 STARTDATE <EPOCHTIME> 0 earliest time job should be allowed to start
9 STARTTIME* <EPOCHTIME> 0 time job was started by the resource manager
10 COMPLETIONTIME* <EPOCHTIME> 0 time job completed execution
11 UNAME* <STRING> [NONE] UserID under which job will run
12 GNAME* <STRING> [NONE] GroupID under which job will run
13 ACCOUNT <STRING> [NONE] AccountID associated with job
14 RFEATURES colon delimited list <STRING>'s [NONE] List of features required on nodes
15 RNETWORK <STRING> [NONE] network adapter required by job
16 DNETWORK <STRING> [NONE] network adapter which must be dedicated to job
17 RCLASS list of bracket enclosed <STRING>:<INTEGER> pairs [NONE] list of <CLASSNAME>:<COUNT> pairs indicating type and number of class instances required per task.  (ie, '[batch:1]' or '[batch:2][tape:1]')
18 ROPSYS <STRING> [NONE] operating system required by job
19 RARCH <STRING> [NONE] architecture required by job
20 RMEM <INTEGER> 0 real memory (RAM, in MB) required to be configured on nodes allocated to the job
21 RMEMCMP one of '>=', '>', '==', '<', or '<=' >= real memory comparison (ie, node must have >= 512MB RAM)
22 DMEM <INTEGER> 0 quantity of real memory (RAM, in MB) which must be dedicated to each task of the job 
23 RDISK <INTEGER> 0 local disk space (in MB) required to be configured on nodes allocated to the job 
24 RDISKCMP one of '>=', '>', '==', '<', or '<=' >= local disk comparison (ie, node must have > 2048 MB local disk)
25 DDISK <INTEGER> 0 quantity of local disk space (in MB) which must be dedicated to each task of the job
26 RSWAP <INTEGER> 0 virtual memory (swap, in MB) required to be configured on nodes allocated to the job
27 RSWAPCMP one of '>=', '>', '==', '<', or '<=' >= virtual memory comparison (ie, node must have ==4096 MB virtual memory) 
28 DSWAP <INTEGER> 0 quantity of virtual memory (swap, in MB) which must be dedicated to each task of the job
29 PARTITIONMASK one or more colon delimited <STRING>s [ANY] list of partitions in which job can run
30 EXEC <STRING> [NONE] job executable command
31 IWD <STRING> [NONE] job's initial working directory
32 COMMENT <STRING> 0 general job attributes not described by other field
33 REJCOUNT <INTEGER> 0 number of times job was rejected
34 REJMESSAGE <STRING> [NONE] text description of reason job was rejected
35 REJCODE <INTEGER> 0 reason job was rejected
36 EVENT <EVENT> [NONE] event or exception experienced by job
37 TASKLIST one or more colon delimited <STRING>s [NONE] nodeid associated with each active task of job (ie, cl01, cl02, cl01, cl02, cl03)
38 TASKPERNODE <INTEGER> 0 exact number of tasks required per node
39 QOS <INTEGER> 0 quality of service requested
40 ENDDATE <EPOCHTIME> [ANY] time by which job must complete
41 CBSERVER <STRING>[:<INTEGER> [NONE] location of server which will handle callback requests in <HOSTNAME>:<PORT> format
42 CBTYPE one or more of the following delimited by colons: CANCEL and START START:CANCEL list of callback types requested by job
43 DPROCS <INTEGER> 1 number of processors dedicated per task
44 SUSPENDTIME <INTEGER> 0 Number of seconds job has been suspended
45 RESERVATION <STRING> [NONE] Name of reservation in which job must run

* indicates required field

NOTE 1:      job states have the following definitions:
      Idle:                job is ready to run
      Running:       job is currently executing
      Hold:              job is in the queue but is not allowed to run
      Suspended:   job has started but execution has temporarily been suspended
      Completed:  job has completed
      Cancelled:    job has been cancelled

NOTE 2:     completed and cancelled jobs should be maintained by the resource manager for a brief time, perhaps 1 to 5 minutes, before being purged.  This provides the scheduler time to obtain all final job state information for scheduler statistics.


StartJob

    The 'StartJob' command may only be applied to jobs in the 'Idle' state.  It causes the job to begin running using the resources listed in the NodeID list.

    send     CMD=STARTJOB ARG=<JOBID> TASKLIST=<NODEID>[:<NODEID>]...

    receive  SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

    EXAMPLE:

        Start job nebo.1 on nodes cluster001 and cluster002

        send 'CMD=STARTJOB ARG=nebo.1 TASKLIST=cluster001:cluster002'

        receive 'SC=0;RESPONSE=job nebo.1 started with 2 tasks'



CancelJob

    The 'CancelJob' command, if applied to an active job, with terminate its execution.  If applied to an idle or active job, the CancelJob command will change the job's state to 'Cancelled'.

    send     CMD=CANCELJOB ARG=<JOBID> TYPE=<CANCELTYPE>

    <CANCELTYPE> is one of the following:

    ADMIN               (command initiated by scheduler administrator)
    WALLCLOCK (command initiated by scheduler because job exceeded its specified wallclock limit)

    receive  SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

    EXAMPLE:

        Cancel job nebo.2

        send 'CMD=CANCELJOB ARG=nebo.2 TYPE=ADMIN'

        receive 'SC=0 RESPONSE=job nebo.2 cancelled'



SuspendJob

    The 'SuspendJob' command can only be issued against a job in the state 'Running'.  This command suspends job execution and results in the job changing to the 'Suspended' state.
 
    send     CMD=SUSPENDJOB ARG=<JOBID>

    receive  SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

    EXAMPLE:

        Resume job nebo.3

        send 'CMD=RESUMEJOB ARG=nebo.3'

        receive 'SC=0 RESPONSE=job nebo.3 resumed'



ResumeJob

    The 'ResumeJob' command can only be issued against a job in the state 'Suspended'.  This command resumes a suspended job returning it to the 'Running' state.

  send     CMD=RESUMEJOB ARG=<JOBID>

  receive  SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

    EXAMPLE:

        Resume job nebo.3

        send 'CMD=RESUMEJOB ARG=nebo.3'

        receive 'SC=0 RESPONSE=job nebo.3 resumed'



JobAddTask

    The 'JobAddTask' command allocates additional tasks to an active job.

    send

        CMD=JOBADDTASK ARG=<JOBID> <NODEID> [<NODEID>]...

    receive

        SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS



           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

    EXAMPLE:

        Add 3 default tasks to job nebo30023.0 using resources located on nodes cluster002, cluster016, and cluster112.

        send 'CMD=JOBADDTASK ARG=nebo30023.0 DEFAULT cluster002 cluster016 cluster112'

        receive 'SC=0 RESPONSE=3 tasks added'



JobReleaseTask

    The 'JobReleaseTask' command removes tasks from an active job.

    send

        CMD=JOBREMOVETASK ARG=<JOBID> <TASKID> [<TASKID>]...

    receive

        SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE <  0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

    EXAMPLE:

        Free resources allocated to tasks 14, 15, and 16 of job nebo30023.0

        send 'CMD=JOBREMOVETASK ARG=nebo30023.0 14 15 16'

        receive 'SC=0 RESPONSE=3 tasks removed'


[an error occurred while processing this directive] [an error occurred while processing this directive]