Workload Accounting Records
Moab Workload Manager®

16.3.3 Workload Accounting Records

Moab workload accounting records fully describe all scheduling relevant aspects of batch jobs including resources requested and used, time of all major scheduling events (such as submission time and start time), the job credentials used, and the job execution environment. Each job trace is composed of a single line consisting of whitespace delimited fields as shown in the table in section 16.3.3.1.

NOTE: Moab can be configured to provide this information in flat text tabular form or in XML format conforming to the SSS 1.0 job description specification.


16.3.3.1 Workload Event Record Format (v 5.0.0)

All job events (JOBSUBMIT, JOBSTART, JOBEND, and so forth) provide job data in a standard format as described in the following table:

Field Name Field Index Data Format Default Value Details
Event Time (Human Readable) 1 HH:MM:SS - Specifies time event occurred.
Event Time (Epoch) 2 <epochtime> - Specifies time event occurred.
Object Type 3 job - Specifies record object type.
Object ID 4 <STRING> - Unique object identifier.
Object Event 5 one of jobcancel, jobcheckpoint, jobend, jobfailure, jobhold, jobmigrate, jobpreempt, jobreject, jobresume, jobstart or jobsubmit - Specifies record event type.
Nodes Requested 6 <INTEGER> 0 Number of nodes requested (0 = no node request count specified).
Tasks Requested 7 <INTEGER> 1 Number of tasks requested.
User Name 8 <STRING> - Name of user submitting job.
Group Name 9 <STRING> - Primary group of user submitting job.
Wallclock Limit 10 <INTEGER> 1 Maximum allowed job duration (in seconds).
Job Event State 11 <STRING> - Job state at time of event.
Required Class 12 <STRING> [DEFAULT:1] Class/queue required by job specified as square bracket list of <QUEUE>[:<QUEUEINSTANCE>] requirements. (For example: [batch:1]).
Submission Time 13 <INTEGER> 0 Epoch time when job was submitted.
Dispatch Time 14 <INTEGER> 0 Epoch time when scheduler requested job begin executing.
Start Time 15 <INTEGER> 0 Epoch time when job began executing. (NOTE: Usually identical to Dispatch Time.)
Completion Time 16 <INTEGER> 0 Epoch time when job completed execution.
Required Network Adapter 17 <STRING> - Name of required network adapter if specified.
Required Node
 Architecture
18 <STRING> - Required node architecture if specified.
Required Node
 Operating System
19 <STRING> - Required node operating system if specified.
Required Node
 Memory
 Comparison
20 one of >, >=, =, <=, < >= Comparison for determining compliance with required node memory.
Required Node
 Memory
21 <INTEGER> 0 Amount of required configured RAM (in MB) on each node.
Required Node Disk
 Comparison
22 one of >, >=, =, <=, < >= Comparison for determining compliance with required node disk.
Required Node Disk 23 <INTEGER> 0 Amount of required configured local disk (in MB) on each node.
Required Node
 Attributes/Features
24 <STRING  - Square bracket enclosed list of node features required by job if specified. (For example: [fast][ethernet])
System Queue
 Time
25 <INTEGER> 0 Epoch time when job met all fairness policies.
Tasks Allocated 26 <INTEGER> <TASKS REQUESTED> Number of tasks actually allocated to job. (NOTE: In most cases, this field is identical to field #3, Tasks Requested.)
Required Tasks Per Node 27 <INTEGER> -1 Number of Tasks Per Node required by job or '-1' if no requirement specified.
QOS 28 <STRING>[:<STRING>] - QoS requested/assigned using the format <QOS_REQUESTED>[:<QOS_DELIVERED>]. (For example: hipriority:bottomfeeder)
JobFlags 29 <STRING>[:<STRING>]... - Square bracket delimited list of job attributes. (For example: [BACKFILL][BENCHMARK][PREEMPTEE])
Account Name 30 <STRING> - Name of account associated with job if specified.
Executable 31 <STRING> - Name of job executable if specified.
Resource Manager Extension String 32 <STRING> - Resource manager specific list of job attributes if specified. See the Resource Manager Extension Overview for more information.
Bypass Count 33 <INTEGER> -1 Number of time job was bypassed by lower priority jobs via backfill or '-1' if not specified.
ProcSeconds
 Utilized
34 <DOUBLE> 0 Number of processor seconds actually used by job.
Partition Name 35 <STRING> [DEFAULT] Name of partition in which job ran.
Dedicated Processors per Task 36 <INTEGER> 1 Number of processors required per task.
Dedicated Memory per Task 37 <INTEGER> 0 Amount of RAM (in MB) required per task.
Dedicated Disk per Task 38 <INTEGER> 0 Amount of local disk (in MB) required per task.
Dedicated Swap per Task 39 <INTEGER> 0 Amount of virtual memory (in MB) required per task.
Start Date 40 <INTEGER> 0 Epoch time indicating earliest time job can start.
End Date 41 <INTEGER> 0 Epoch time indicating latest time by which job must complete.
Allocated Host List 42 <hostname>[,<hostname>]... - Comma delimited list of hosts allocated to job. (For example: node001,node004)
Resource Manager Name 43 <STRING> - Name of resource manager if specified.
Required Host List 44 <hostname>[,<hostname>]... - List of hosts required by job. (If the job's taskcount is greater than the specified number of hosts, the scheduler must use these nodes in addition to others; if the job's taskcount is less than the specified number of hosts, the scheduler must select needed hosts from this list.)
Reservation 45 <STRING> - Name of reservation required by job if specified.
Application Simulator Data 46 <STRING>[:<STRING>] - Name of application simulator module and associated configuration data. (For example: HSM:IN=infile.txt:140000;OUT=outfile.txt:500000)
Set Description 47 <STRING>:<STRING>[:<STRING>] - Set constraints required by node in the form <SetConstraint>:<SetType>[:<SetList>] where SetConstraint is one of ONEOF, FIRSTOF, or ANYOF, SetType is one of PROCSPEED, FEATURE, or NETWORK, and SetList is an optional colon delimited list of allowed set attributes. (For example: ONEOF:PROCSPEED:350:450:500)
Job Message 48 <STRING> - Job messages including resource manager, scheduler, and administrator messages if specified.
Job Cost 49 <DOUBLE> 0.0 Cost of executing job incorporating resource consumption metric; resource quantity consumed; and credential, allocated resource, and delivered QoS charge rates.
History 50 <STRING> - List of job events impacting resource allocation (XML).

NOTE: History information is only reported in Moab 5.1.0 and higher.
Utilization 51 Comma delimited list of one or more of the following: <ATTR>=<VALUE> pairs where <VALUE> is a double and <ATTR> is one of the following: network (in MB transferred), license (in license-seconds), storage (in MB-seconds stored), or gmetric:<TYPE>. - Cumulative resources used over life of job.
Estimate Data 52 <STRING> - List of job estimate usage.
Completion Code 53 <INTEGER> - Job exit status/completion code.
Extended Memory Load Information 54 <STRING> - Extended memory usage statistics (max, mem, avg, and so forth).
Extended CPU Load Information 55 <STRING> - Extended CPU usage statistics (max, mem, avg, and so forth).

NOTE: If no applicable value is specified, the exact string - should be entered.

NOTE: Fields that contain a description string such as Job Message use a packed string format. The packed string format replaces white space characters such as spaces and carriage returns with a hex character representation. For example a blank space is respresented as \20. Since fields in the event record are space delimited, this preserves the correct order and spacing of fields in the record.

Sample Workload Trace

Sample Workload Trace
13:21:05 110244355 job 1413 JOBEND 20 20 josh staff 86400 Removed [batch:1] 887343658 889585185 \
889585185 889585411 ethernet R6000 AIX53 >= 256 >= 0 - 889584538 20 0 0 2 0 test.cmd \
1001 6 678.08 0 1 0 0 0 0 0 - 0 - - - - - - - - 0.0 - - - 0 - -

16.3.3.2 Creating New Workload Simulation Traces

Because workload event records and simulation workload traces use the same format, these event records can be used as a starting point for generating a new simulation trace. In the Moab simple case, an event record or collection of event records can be used directly as the value for the SIMWORKLOADTRACEFILE as in the following example:

using event records
# collect all job records for December
> cat /opt/moab/stats/events.*Dec*2006 | grep JOBEND > /opt/moab/DecJobs.txt

# edit moab.cfg for use job records
> vi /opt/moab/moab.cfg
  (add 'SIMWORKLOADTRACEFILE /opt/moab/DecJobs.txt')
  (set SIMRESOURCETRACEFILE, SCHEDCFG[] MODE and other simulation parameters as described in the Simulation Overview

# start the simulation
> moab

NOTE: In the preceding example, all non-JOBEND events were filtered out. This step is not required but only JOBEND events are used in a simulation; other events are ignored by Moab.

Modifying Existing Job Event Records

When creating a new simulation workload, it is often valuable to start with workload traces representing a well-known or even local workload. These traces preserve distribution information about job submission times, durations, processor count, users, groups, projects, special resource requests, and numerous other factors that effectively represent an industry, user base, or organization.

When modifying records, a field or combination of fields can be altered, new jobs inserted, or certain jobs filtered out.

NOTE: Because job event records are used for multiple purposes, some of the fields are valuable for statistics or auditing purposes but are ignored in simulations. For the most part, fields representing resource utilization information are ignored while fields representing resource requests are not.

Modifying Time Distribution Factors of a Workload Trace

In some cases, simulations focus on determining the effects of changing the quantities or types of jobs or on changing policies or job ownership to see changes to system performance and resource utiliation. However, other times simulations tend to focus on response-time metrics as job submission and job duration aspects of the workload are modified. Which time-based fields are important to modify depend on the simulation purpose and the setting of the JOBSUBMISSIONPOLICY parameter.

JOBSUBMISSIONPOLICY Value Critical Time Based Fields
NORMAL WallClock Limit
Submission Time
StartTime
Completion Time
CONSTANTJOBDEPTH
CONSTANTPSDEPTH
WallClock Limit
StartTime
Completion Time

NOTE 1:  Dispatch Time should always be identical to Start Time
NOTE 2:  In all cases, the difference of 'Completion Time - Start Time' is used to determine actual job run time.
NOTE 3:  System Queue Time and Proc-Seconds Utilized are only used for statistics gathering purposes and will not alter the behavior of the simulation.
NOTE 4:  In all cases, relative time values are important, i.e., Start Time must be greater than or equal to Submission Time and less than Completion Time.

Creating Workload Traces From Scratch

   There is nothing which prevents a completely new workload trace from being created from scratch.  To do this, simply create a file whith fields matching the format described in the Workload Event Record Format section.


16.3.3.3  Reservation Records/Traces

Field Name Field Index Data Format Default Value Details
Event Time (Human) 0 [HH:MM:SS] - Specifies time event occurred.
Event Time (Epoch) 1 <epochtime> - Specifies time event occurred.
Object Type 2 rsv - Specifies record object type.
Object ID 3 <STRING> - Unique object identifier.
Object Event 4 one of rsvcreate, rsvstart, rsvmodify, rsvfail or rsvend - Specifies record event type.
Creation Time 5 <EPOCHTIME> - Specifies epoch time of reservation start date.
Start Time 6 <EPOCHTIME> - Specifies epoch time of reservation start date.
End Time 7 <EPOCHTIME> - Specifies epoch time of reservation end date.
Tasks Allocated 8 <INTEGER> - Specifies number of tasks allocated to reservation at event time.
Nodes Allocated 9 <INTEGER> - Specifies number of nodes allocated to reservation at event time.
Total Active Proc-Seconds 10 <INTEGER> - Specifies proc-seconds reserved resources were dedicated to one or more job at event time.
Total Proc-Seconds 11 <INTEGER> - Specifies proc-seconds resources were reserved at event time.
Hostlist 12 <comma delimited list of hostnames> - Specifies list of hosts reserved at event time.
Owner 13 <STRING> - Specifies reservation ownership credentials.
ACL 14 <STRING> - Specifies reservation access control list.
Category 15 <STRING> - Specifies associated node category assigned to reservation.
Comment 16 <STRING> - Specifies general human readable event message.

See Also