SSS Job Object Specification
Draft Release Version 3.0.3
Scott Jackson, PNNL
David Jackson,
Brett Bode,
Scalable Systems Software Job Object Specification
Status of this Memo
This document describes the job object to be used by Scalable Systems Software compliant components. It is envisioned for this specification to be used in conjunction with the SSSRMAP protocol with the job object passed in the Data field of Requests and Responses. Queries can be issued to a job-cognizant component in the form of modified XPATH expressions to the Get field to extract specific information from the job object as described in the SSSRMAP protocol.
Abstract
This document describes the syntax and structure of the SSS job object. A job model is described that is flexible enough to support the specification of very simple jobs as well as jobs with elaborate and complex specification requirements in a way that avoids complex structures and syntax when it is not needed. The basic assumption is that a solitary job specification should be usable for all phases of the job lifecycle and can be used at submission, queuing, staging, reservations, quotations, execution, charging, accounting, etc. This job specification provides support for multi-step jobs, as well as jobs with disparate task descriptions. It accounts for operational requirements in a grid or meta-scheduled environment where the job is executed by multiple hosts in different administrative domains that support different resource management systems.
Scalable
Systems Software Job Object Specification
2........ Conventions
used in this document
2.2 Table Column Interpretations
2.3 Element Syntax Cardinality
4.1.1 Simple JobGroup Properties
5........ Job and
JobDefaults Element
5.1.5 TaskDistribution Element
6........ TaskGroup and
TaskGroupDefaults Element
6.1.1 Simple TaskGroup Properties
7........ Task and
TaskDefaults Element
9........ AwarenessPolicy
Attribute
Units
of Measure Abbreviations
This specification proposes a standard XML representation for a job object for use by the various components in the SSS Resource Management System. This object will be used in multiple contexts and by multiple components. It is anticipated that this object will be passed via the Data Element of SSSRMAP Requests and Responses.
There are several goals motivating the design of this representation.
The representation needs to be inherently flexible. We recognize we will not be able to exhaustively include the ever-changing job properties and capabilities that constantly arise.
The representation should use the same job object at all stages of that job’s lifecycle. This object will be used at job submission, queuing, scheduling, charging and accounting, hence it may need to distinguish between requested and delivered properties.
The design must account for the properties and structure required to function in a meta or grid environment. It needs to include the capability to support local mapping of properties, global namespaces, etc.
The equivalent of multi-step jobs must be supported. Each step (job) can have multiple logical task descriptions.
Many potential users of the specification will not be prepared to implement the complex portions or fine-granularity that others need. There needs to be a way to allow the more complicated structure to be added as needed while leaving more straightforward cases simple.
There needs to be guidance for how to understand a given job object when higher order features are not supported by an implementation, and which parts are required, recommended and optional for implementers to implement.
It needs to support composite resources.
It should include the ability to specify preferences or fuzzy requirements.
Namespace considerations and naming conventions for most property values are outside of the scope of this document.
This example shows a simple job object that captures the requirements of a simple job.
<Job>
<JobId>PBS.1234.0</JobId>
<JobState>Idle</JobState>
<UserId>scottmo</UserId>
<Executable>/bin/hostname</Executable>
<Processors>16</Processors>
<WallDuration>3600</WallDuration>
</Job>
This example shows a moderately complex job object that uses features such as required versus delivered properties.
<Job>
<JobId>PBS.1234.0</JobId>
<JobName>Heavy Water</JobName>
<ProjectId>nwchemdev</ProjectId>
<UserId>peterk</UserId>
<Application>NWChem</Application>
<Executable>/usr/local/nwchem/bin/nwchem</Executable>
<Arguments>-input basis.in</Arguments>
<InitialWorkingDirectory>/home/peterk</InitialWorkingDirectory>
<MachineName>Colony</MachineName>
<QualityOfService>BottomFeeder</QualityOfService>
<Queue>batch_normal</Queue>
<JobState>Completed</JobState>
<StartTime>1051557713</StartTime>
<EndTime>1051558868</EndTime>
<Charge>25410</Charge>
<Requested>
<Processors op=”ge”>12</Processors>
<Memory op=”ge” units=”GB”>2</Memory>
<WallDuration>3600</WallDuration>
</Requested>
<Delivered>
<Processors>16</Processors>
<Memory metric=”Average” units=”GB”>1.89</Memory>
<WallDuration>1155</WallDuration>
</Delivered>
<Environment>
<Variable name=”PATH”>/usr/bin:/home/peterk</Variable>
</Environment>
</Job>
This example uses a job group to encapsulate a multi-step job. It shows this protocol’s ability to characterize complex job processing capabilities. A component that processes this message is free to retain only that part of the information that it requires. Superfluous information can be ignored by the component or filtered out (by XSLT for example).
<JobGroup>
<JobGroupId>fr15n05.1234</JobGroupId>
<JobGroupState>Active</JobGroupState>
<JobGroupName>ShuttleTakeoff</JobGroupName>
<JobDefaults>
<StagedTime>1051557859</StagedTime>
<SubmitHost>asteroid.lbl.gov</SubmitHost>
<SubmissionTime>1051556734</SubmissionTime>
<ProjectId>GrandChallenge18</ProjectId>
<GlobalUserId>C=US,O=LBNL,CN=Keith Jackson</GlobalUserId>
<UserId>keith</UserId>
<Environment>
<Variable name=”LD_LIBRARY_PATH”>/usr/lib</Variable>
<Variable name=”PATH”>/usr/bin:~/bin:</Variable>
<Environment>
</JobDefaults>
<Job>
<JobId>fr15n05.1234.0</JobId>
<JobName>Launch Vector Initialization</JobName>
<Executable>/usr/local/gridphys/bin/lvcalc</Executable>
<Queue>batch</Queue>
<JobState>Completed</JobState>
<MachineName>SMP2.emsl.pnl.gov</MachineName>
<StartTime>1051557713</StartTime>
<EndTime>1051558868</EndTime>
<QuoteId>http://www.pnl.gov/SMP2#654321</QuoteId>
<Charge units=”USD”>12.75</Charge>
<Requested>
<WallDuration>3600</WallDuration>
<Processors>2</Processors>
<Memory>1024</Memory>
</Requested>
</Delivered>
<WallDuration>1155</WallDuration>
<Processors consumptionRate=”0.78”>2</Processors>
<Memory metric=”max”>975</Memory>
</Delivered>
<TaskGroup>
<TaskCount>2</TaskCount>
<TaskDistribution type=”TasksPerNode”>1</TaskDistribution>
<Task>
<Node>node1</Node>
<ProcessId>99353</ProcessId>
</Task>
<Task>
<Node>node12</Node>
<ProcessId>80209</ProcessId>
</Task>
</TaskGroup>
</Job>
<Job>
<JobId>fr15n05.1234.1</JobId>
<JobName>3-Phase Ascension</JobName>
<Queue>batch_normal</Queue>
<JobState>Idle</JobState>
<MachineName>Colony.emsl.pnl.gov</MachineName>
<Priority>1032847</Priority>
<Hold>System</Hold>
<StatusMessage>Insufficient funds to start job</StatusMessage>
<Requested>
<WallDuration>43200</WallDuration>
</Requested>
<TaskGroup>
<TaskCount>1</TaskCount>
<TaskGroupName>Master</TaskGroupName>
<Executable>/usr/local/bin/stage-coordinator</Executable>
<Memory>2048<Memory>
<Resource name=”License” type=”ESSL2”>1</Resource>
<NodeProperties>
<Feature>Jumbo-Frame</Feature>
</NodeProperties>
</TaskGroup>
<TaskGroup>
<TaskGroupName>Slave</TaskGroupName>
<TaskDistribution type=”Rule”>RoundRobin</TaskDistribution>
<Executable>/usr/local/bin/stage-slave</Executable>
<NodeCount>4</NodeCount>
<Requested>
<Processors group=”-1”>12</Processors>
<Processors conj=”or” group=”1”>16</Processors>
<Memory>512</Memory>
<NodeProperties>
<Name op=”match”>fr15n.*</Name>
</NodeProperties>
</Requested>
</TaskGroup>
</Job>
</JobGroup>
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC2119.
The columns of the property tables in this document have the following meanings:
Element Name: Name of the XML element (xsd:element) see [DATATYPES]
Type: Data type defined by xsd (XML Schema Definition) as:
String xsd:string (a finite length sequence of printable characters)
Integer xsd:integer (a signed finite length sequence of decimal digits)
Float xsd:float (single-precision 32-bit floating point)
Boolean xsd:boolean (consists of the literals “true” or “false”)
DateTime xsd:int (a 32-bit unsigned long in GMT seconds since the EPOCH)
Duration xsd:int (a 32-bit unsigned long measured in seconds)
Description: Brief description of the meani