[torqueusers] trap preempt signals?
Brock Palen
brockp at umich.edu
Wed Dec 3 17:43:38 MST 2008
We have preempt set up for some of our nodes. This is enforced by
moab, but in the end PBS is the one doing the pbs_rerunjob() service
right?
I would like to trap the signal before the job dies and re-queues.
The following code I hacked real fast to find out about how long I had:
function preempted {
echo "SIG 15!!!!1"
y=0
while true
do
echo $y
echo $y >> /tmp/$PBS_JOBID
y=$(($y+1))
sleep 1
done
}
trap preempted SIGTERM
sleep 100
when I preempt the job manually,
mjobctl -R JOBID
I do not get anyoutput in /tmp/$PBS_JOBID or in the .o file,
If I delete the job with:
mjobctl -c JOBID
I get 1 to 5 in .o and in /tmp/$PBS_JOBID
Does any normal signals get sent to the batch script when under a
normal restart?
I know it is a race but some people want to copy some small
checkpoint files around when a job is preempted to be ready for the
restart.
Any ideas?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
More information about the torqueusers
mailing list