TORQUE supports job preemption by allowing authorized users to
suspend and resume jobs. This is supported using one of two methods.
If the node supports OS-level preemption, TORQUE will recognize that during
the configure process and enable it. Otherwise, the MOM may
be configured to launch a custom checkpoint script in order to support
preempting a job. Using a custom checkpoint script requires that the job
understand how to resume itself from a checkpoint after the preemption
occurs.
Configuring a Checkpoint Script on a MOM
To configure the MOM to support a checkpoint script, the
$checkpoint_script parameter must be set in the MOM's configuration
file found in $TORQUEHOME/mom_priv/config. The checkpoint
script should have execute permissions set. A typical config file might
look like:
The second thing that must be done to enable the checkpoint script
is to change the value of MOM_CHECKPOINT to 1 in
.../src/include/pbs_config.h. In some instances,
MOM_CHECKPOINT may already be defined as 1. The
new line should be: