|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Appendix G: Prologue & Epilogue ScriptsTORQUE provides administrators the ability to run scripts before and/or after each job executes. With such a script, a site can prepare systems, perform node health checks, prepend and append text to output and error log files, cleanup systems, and so forth. The following table shows which mom runs which script. All scripts must be in the $PBS_HOME/mom_priv/ directory and be available on every compute node. $PBS_HOME by default is /usr/spool/PBS/. Mother Superior is the pbs_mom on the first node allocated. Sisters refer to all other pbs_moms, although note that a Mother Superior is also a sister node.
* available in Version 2.1
G.1 Script EnvironmentThe prolog and epilog scripts can be very simple. On most systems, the script must declare the execution shell using the #!<SHELL> syntax (i.e., '#!/bin/sh'). In addition, the script may want to process context sensitive arguments passed by TORQUE to the script. Prolog Environment The following arguments are passed to the prologue, prologue.user, and prologue.parallel scripts:
Epilog Environment TORQUE supplies the following arguments to the epilogue, epilogue.user, epilogue.precancel, and epilogue.parallel scripts:
The epilogue.precancel script is run after a job cancel request is received by the MOM and before any signals are sent to job processes. If this script exists, it is run whether the canceled job was active or idle. For all scripts, the environment passed to the script is empty. Also, standard input for both scripts is connected to a system dependent file. Currently, for all systems this is /dev/null. Except for the epilogue scripts of an interactive job, the standard output and error, are connected to input and error files associated with the job. For an interactive job, since the pseudo terminal connection is released after the job completes, the standard input and error point to /dev/null. G.2 Per Job Prologue and Epilogue ScriptsTORQUE now supports per job prologue and epilogue scripts when using the qsub -l option. The syntax is: qsub -l prologue=<prologue_script_path> epilogue=<epilogue_script_path> <script>. The path can be either relative (from the directory where the job is submitted) or absolute. The files must be owned by the user with at least execute and write privileges, and the permissions must not be writeable by group or other. $PBS_HOME/mom_priv/:-r-x------ 1 usertom usertom 24 2009-11-09 16:11 prologue_script.sh -r-x------ 1 usertom usertom 24 2009-11-09 16:11 epilogue_script.sh Example: $ qsub -l prologue=/home/usertom/dev/prologue_script.sh epilogue=/home/usertom/dev/epilogue_script.sh job14.pl G.3 Prologue and Epilogue Scripts Time OutTORQUE takes preventative measures against prologue and epilogue scripts by placing an alarm around the scripts execution. By default, TORQUE sets the alarm to go off after 5 minutes of execution. If the script exceeds this time, it will be terminated and the node will be marked down. This timeout can be adjusted by setting the prologalarm parameter in the mom_priv/config file.
G.4 Prologue Error ProcessingIf the prologue script executes successfully, it should exit with a zero status. Otherwise, the script should return the appropriate error code as defined in the table below. The pbs_mom will report the script's exit status to pbs_server which will in turn take the associated action. The following table describes each exit code for the prologue scripts and the action taken.
Example 1Following are example prologue and epilogue scripts that write the arguments passed to them in the job's standard out file:
Example 2The Ohio Supercomputer Center contributed the following scripts:
prologue
#!/bin/sh
# Create TMPDIR on all the nodes
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# prologue gets 3 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
#
jobid=$1
user=$2
group=$3
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
nodes=$(sort $nodefile | uniq)
else
nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
ssh $i mkdir -m 700 $tmp \&\& chown $user.$group $tmp
done
exit 0
epilogue
#!/bin/sh
# Clear out TMPDIR
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# epilogue gets 9 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
# 4 -- job name
# 5 -- sessionid
# 6 -- resource limits
# 7 -- resources used
# 8 -- queue
# 9 -- account
#
jobid=$1
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
nodes=$(sort $nodefile | uniq)
else
nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
ssh $i rm -rf $tmp
done
exit 0
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| © 2001-2010 Adaptive Computing Enterprises, Inc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||