[torqueusers] Signalling a job with qsig -s SIGUSR2 seems to TERM as well?

Atwood, Robert C r.atwood at imperial.ac.uk
Wed Aug 30 11:11:48 MDT 2006


 
Hi, 
Perhaps I do not understand how to use qsig -s , but it seems pretty
straightforward from the man page and the documentation, if I send 
qsig -s SIGUSR2 
Or 
qsig -s USR2

Or 
qsig -s 12  (on this system)

It should pass the signal SIGUSR2 to the job? It does not mention also
sending the SIGTERM signal as well, but that seems to happen on my
installation using version: 2.1.2-snap.200607191251 with default
scheduler  (OS is SUSE 10 based ClusterVisionOS (CVOS) on Intel EM64T
processors )
Bash version is GNU bash, version 3.00.16(1)-release (x86_64-suse-linux)



If I run the following shell script (testsig):


      1 #!/bin/bash
      2 trap 'echo "Singal USR2 received";date' USR2
      3 trap 'echo "Singal TERM received";date' TERM
      4 trap -p
      5 while [[ 1 ]]
      6 do
      7 a=1
      8 done

Using >% qsub testsig -q test -l walltime=1:00:00

I get the following in the testsig.o#### stdout file:


trap -- 'echo "Singal USR2 received";date' SIGUSR2
trap -- 'echo "Singal TERM received";date' SIGTERM
Singal USR2 received
Wed Aug 30 17:18:37 BST 2006
Singal TERM received
Wed Aug 30 17:18:38 BST 2006

And the job exits. However, when  running testsig from a command line,
issuing the command 
 kill -USR2 %1 
does not cause the job to exit. 

THe effect in my real job script is that signalling the job via qsig
does not allow the job to clean up its scratch files, for example,
something like this pseudoscript:

#!/bin/bash
#PBS -l walltime=1:00:00

(create a scratch dir on local disk)
(copy files to scratch dir)
(run the PROGRAM)
(copy files to the master node)
(delete the scratch dir)

Despite implementing signal handlers in PROGRAM that work correctly
ouside of Torque, signalling via qsig -s causes the job to terminate in
the middle of the (run the PROGRAM) step , receiving both USR2 and TERM
signals, and the following steps in the shell script are not executed. 
Alternatly, I would like to send a signal to obtain intermediate results
"on demand" from a lengthy job but sending the signal terminates the
job, again despite signal handlers that work correctly ouside the torque
context.


What am I doing wrong or misunderstanding? Or do people recommend
another way entirely to do what I want to do?



Thanks,
Robert







More information about the torqueusers mailing list