[torqueusers] trap preempt signals?

Brock Palen brockp at umich.edu
Wed Dec 3 17:43:38 MST 2008


We have preempt set up for some of our nodes.  This is enforced by  
moab, but in the end PBS is the one doing the pbs_rerunjob() service  
right?

I would like to trap the signal before the job dies and re-queues.

The following code I hacked real fast to find out about how long I had:

function preempted {
     echo "SIG 15!!!!1"
     y=0
     while true
     do
       echo $y
       echo $y >> /tmp/$PBS_JOBID
       y=$(($y+1))
       sleep 1
     done
}

trap preempted SIGTERM

sleep 100


when I preempt the job manually,
mjobctl -R JOBID

I do not get anyoutput in /tmp/$PBS_JOBID or in the .o file,
If I delete the job with:
mjobctl -c JOBID

I get 1 to 5 in .o and in /tmp/$PBS_JOBID

Does any normal signals get sent to the batch script when under a  
normal restart?

I know it is a race but some people want to copy some small  
checkpoint files around when a job is preempted to be ready for the  
restart.
Any ideas?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985





More information about the torqueusers mailing list