TORQUE provides to ability to run scripts before and/or after each job executes. With such a script, a site can prepare systems, perform node health checks, prepend and append text to output and error log files, cleanup systems, etc.
The table below shows which scripts will be run by which mom. All scripts must be in the $PBS_HOME/mom_priv/ directory and be available on every compute node. $PBS_HOME by default is /usr/spool/PBS/. Mother Superior is the pbs_mom on the first node allocated. Sisters refer to all other pbs_moms. $USER_HOME means the $HOME directory of the user running the job.
Script
Execution Location
Privileges
Execution Directory
File Permissions
prologue
Mother Superior
root
$PBS_HOME/mom_priv/
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue
Mother Superior
root
$USER_HOME
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
prologue.<name>
Mother Superior
root
$PBS_HOME/mom_priv/
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.<name>
Mother Superior
root
$PBS_HOME/mom_priv/
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
prologue.user
Mother Superior
user
$PBS_HOME/mom_priv/
readable and executable by root and other (e.g., -r-x---r-x)
epilogue.user
Mother Superior
user
$USER_HOME
readable and executable by root and other (e.g., -r-x---r-x)
prologue.parallel
Sisters
root
$PBS_HOME/mom_priv/
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.parallel*
Sisters
root
readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.precancel
Mother Superior (NOTE: this script is run after a job cancel request is
received from pbs_server and before a kill signal is sent to the job process)
root
$USER_HOME
readable and executable by root and NOT writable by anyone besides root
(e.g., -r-x------)
* available in Version 2.1
G.1 Script Environment
The prolog and epilog scripts can be very simple. On most systems, the script must declare the execution shell using the #!<SHELL> syntax (i.e., '#!/bin/sh'). In addition, the script may want to process context sensitive arguments passed by TORQUE to the script.
Prolog Environment
The following arguments are passed to the prologue, prologue.user and prologue.parallel scripts:
argv[1]
job id
argv[2]
job execution user name
argv[3]
job execution group name
argv[4]
job name (TORQUE 1.2.0p4 and higher only)
argv[5]
list of requested resource limits (TORQUE 1.2.0p4 and higher only)
argv[6]
job execution queue (TORQUE 1.2.0p4 and higher only)
argv[7]
job account (TORQUE 1.2.0p4 and higher only)
Epilog Environment
TORQUE supplies the following arguments to the epilogue, epilogue.user,
epilogue.precancel, and epilogue.parallel scripts:
argv[1]
job id
argv[2]
job execution user name
argv[3]
job execution group name
argv[4]
job name
argv[5]
session id
argv[6]
list of requested resource limits
argv[7]
list of resources used by job
argv[8]
job execution queue
argv[9]
job account
The epilogue.precancel script is run after a job cancel request is
received by the MOM and before any signals are sent to job processes. If this script
exists, it is run whether the canceled job was active or idle.
For all scripts, the environment passed to the script is empty.
Also, standard input for both scripts is connected to a system dependent file.
Currently, for all systems this is /dev/null. Except for the epilogue scripts
of an interactive job, the standard output and error, are connected to input and error
files associated with the job. For an interactive job, since the pseudo terminal
connection is released after the job completes, the standard input and error point to
/dev/null.
G.2 Per Job Prologue and Epilogue Scripts
TORQUE now supports per job prologue and epilogue scripts when using the qsub -T option. The scripts are prologue.<name> and epilogue.<name> where <name> is an arbitrary name. When submitting a job, the syntax is qsub -T <name>.
G.3 Prologue and Epilogue Scripts Time Out
TORQUE takes preventative measures against prologue and epilogue scripts by placing an alarm around the scripts execution. By default, TORQUE sets the alarm to go off after 5 minutes of execution. If the script exceeds this time, it will be terminated and the node will be marked down. This timeout can be adjusted by setting the prologalarm parameter in the mom_priv/config file.
While TORQUE is executing the epilog, epilog.user, or
epilog.precancel scripts, the job will be in the E (exiting) state.
G.4 Prolog Error Processing
If the prolog script executes successfully, it should exit with a zero status. Otherwise, the script should return the appropriate error code as defined in the table below. The pbs_mom will report the script's exit status to pbs_server which will in turn take the associated action. The following table describes each exit code for the prologue scripts and the action taken.
Error
Description
Action
-4
The script timed out
Job will be requeued
-3
The wait(2) call returned an error
Job will be requeued
-2
Input file could not be opened
Job will be requeued
-1
Permission error
(script is not owned by root, or is writable by others)
Job will be requeued
0
Successful completion
Job will run
1
Abort exit code
Job will be aborted
>1
other
Job will be requeued
Example 1
Below are example prologue and epilogue scripts that write the arguments passed to them in the job's standard out file:
The Ohio Supercomputer Center contributed the following scripts:
"prologue creates a unique temporary directory on each node assigned to a job before the job begins to run, and epilogue deletes that directory after the job completes. (Note that having a separate temporary directory on each node is probably not as good as having a good, high performance parallel filesystem.)"
Prologue, prologue.user and prologue.parallel scripts can have dramatic effects on job scheduling if written improperly.