TORQUE Resource Manager

TORQUE Administrator's Manual - Appendix G: Prologue & Epilogue Scripts

Appendix G: Prologue & Epilogue Scripts

TORQUE provides to ability to run scripts before and/or after each job executes. With such a script, a site can prepare systems, perform node health checks, prepend and append text to output and error log files, cleanup systems, etc.

The table below shows which scripts will be run by which mom.  All scripts must be in the $PBS_HOME/mom_priv/ directory and be available on every compute node.  $PBS_HOME by default is /usr/spool/PBS/.  Mother Superior is the pbs_mom on the first node allocated.  Sisters refer to all other pbs_moms.  $USER_HOME means the $HOME directory of the user running the job.

Script Execution Location Privileges Execution Directory File Permissions
prologue Mother Superior root $PBS_HOME/mom_priv/ readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue Mother Superior root $USER_HOME readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
prologue.<name> Mother Superior root $PBS_HOME/mom_priv/ readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.<name> Mother Superior root $PBS_HOME/mom_priv/ readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
prologue.user Mother Superior user $PBS_HOME/mom_priv/ readable and executable by root and other (e.g., -r-x---r-x)
epilogue.user Mother Superior user $USER_HOME readable and executable by root and other (e.g., -r-x---r-x)
prologue.parallel Sisters root $PBS_HOME/mom_priv/ readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.parallel* Sisters root readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)
epilogue.precancel Mother Superior (NOTE: this script is run after a job cancel request is received from pbs_server and before a kill signal is sent to the job process) root $USER_HOME readable and executable by root and NOT writable by anyone besides root (e.g., -r-x------)

* available in Version 2.1

G.1 Script Environment

   The prolog and epilog scripts can be very simple.  On most systems, the script must declare the execution shell using the #!<SHELL> syntax (i.e., '#!/bin/sh').   In addition, the script may want to process context sensitive arguments passed by TORQUE to the script.

Prolog Environment

   The following arguments are passed to the prologue, prologue.user and prologue.parallel scripts:
argv[1]   job id
argv[2]   job execution user name
argv[3]   job execution group name
argv[4]   job name (TORQUE 1.2.0p4 and higher only)
argv[5]   list of requested resource limits (TORQUE 1.2.0p4 and higher only)
argv[6]   job execution queue (TORQUE 1.2.0p4 and higher only)
argv[7]   job account (TORQUE 1.2.0p4 and higher only)

Epilog Environment

   TORQUE supplies the following arguments to the epilogue, epilogue.user, epilogue.precancel, and epilogue.parallel scripts:
argv[1]   job id
argv[2]   job execution user name
argv[3]   job execution group name
argv[4]   job name
argv[5]   session id
argv[6]   list of requested resource limits
argv[7]   list of resources used by job
argv[8]   job execution queue
argv[9]   job account

   The epilogue.precancel script is run after a job cancel request is received by the MOM and before any signals are sent to job processes.  If this script exists, it is run whether the canceled job was active or idle.

   For all scripts, the environment passed to the script is empty.  Also, standard input for both scripts is connected to a system dependent file.  Currently, for all systems this is /dev/null.  Except for the epilogue scripts of an interactive job, the standard output and error, are connected to input and error files associated with the job.  For an interactive job, since the pseudo terminal connection is released after the job completes, the standard input and error point to /dev/null.

G.2 Per Job Prologue and Epilogue Scripts

TORQUE now supports per job prologue and epilogue scripts when using the qsub -T option. The scripts are prologue.<name> and epilogue.<name> where <name> is an arbitrary name. When submitting a job, the syntax is qsub -T <name>.

Example Per Job Prologue Submission
$PBS_HOME/mom_priv/

drwxr-x--x 2 root root 28672 Sep  8 15:36 jobs
-r-x------ 1 root root   107 Sep  8 16:31 prologue.prescript

$ qsub -T prescript jobscript.sh

G.3 Prologue and Epilogue Scripts Time Out

   TORQUE takes preventative measures against prologue and epilogue scripts by placing an alarm around the scripts execution.  By default, TORQUE sets the alarm to go off after 5 minutes of execution.  If the script exceeds this time, it will be terminated and the node will be marked down.  This timeout can be adjusted by setting the prologalarm parameter in the mom_priv/config file.

While TORQUE is executing the epilog, epilog.user, or epilog.precancel scripts, the job will be in the E (exiting) state.

G.4 Prolog Error Processing

   If the prolog script executes successfully, it should exit with a zero status.  Otherwise, the script should return the appropriate error code as defined in the table below.  The pbs_mom will report the script's exit status to pbs_server which will in turn take the associated action.  The following table describes each exit code for the prologue scripts and the action taken.
Error Description Action
-4 The script timed out Job will be requeued
-3 The wait(2) call returned an error Job will be requeued
-2 Input file could not be opened Job will be requeued
-1 Permission error
(script is not owned by root, or is writable by others)
Job will be requeued
0 Successful completion Job will run
1 Abort exit code Job will be aborted
>1 other Job will be requeued

Example 1

   Below are example prologue and epilogue scripts that write the arguments passed to them in the job's standard out file:

Script stdout
prologue
#!/bin/sh

echo "Prologue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo ""

exit 0


Prologue Args:
Job ID: 13724.node01
User ID: user1
Group ID: user1
epilogue
#!/bin/sh

echo "Epilogue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo "Job Name: $4"
echo "Session ID: $5"
echo "Resource List: $6"
echo "Resources Used: $7"
echo "Queue Name: $8"
echo "Account String: $9"
echo ""

exit 0


Epilogue Args:
Job ID: 13724.node01
User ID: user1
Group ID: user1
Job Name: script.sh
Session ID: 28244
Resource List: neednodes=node01,nodes=1,walltime=00:01:00
Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:07
Queue Name: batch
Account String: 

Example 2

The Ohio Supercomputer Center contributed the following scripts:
"prologue creates a unique temporary directory on each node assigned to a job before the job begins to run, and epilogue deletes that directory after the job completes. (Note that having a separate temporary directory on each node is probably not as good as having a good, high performance parallel filesystem.)"

prologue
#!/bin/sh
# Create TMPDIR on all the nodes
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# prologue gets 3 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
#
jobid=$1
user=$2
group=$3
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
    nodes=$(sort $nodefile | uniq)
else
    nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
    ssh $i mkdir -m 700 $tmp \&\& chown $user.$group $tmp
done
exit 0

epilogue
#!/bin/sh
# Clear out TMPDIR
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# epilogue gets 9 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
# 4 -- job name
# 5 -- sessionid
# 6 -- resource limits
# 7 -- resources used
# 8 -- queue
# 9 -- account
#
jobid=$1
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
    nodes=$(sort $nodefile | uniq)
else
    nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
    ssh $i rm -rf $tmp
done
exit 0

Prologue, prologue.user and prologue.parallel scripts can have dramatic effects on job scheduling if written improperly.