[torqueusers] Signalling a job with qsig -s SIGUSR2 seems to TERM
as well?
Atwood, Robert C
r.atwood at imperial.ac.uk
Wed Aug 30 11:11:48 MDT 2006
Hi,
Perhaps I do not understand how to use qsig -s , but it seems pretty
straightforward from the man page and the documentation, if I send
qsig -s SIGUSR2
Or
qsig -s USR2
Or
qsig -s 12 (on this system)
It should pass the signal SIGUSR2 to the job? It does not mention also
sending the SIGTERM signal as well, but that seems to happen on my
installation using version: 2.1.2-snap.200607191251 with default
scheduler (OS is SUSE 10 based ClusterVisionOS (CVOS) on Intel EM64T
processors )
Bash version is GNU bash, version 3.00.16(1)-release (x86_64-suse-linux)
If I run the following shell script (testsig):
1 #!/bin/bash
2 trap 'echo "Singal USR2 received";date' USR2
3 trap 'echo "Singal TERM received";date' TERM
4 trap -p
5 while [[ 1 ]]
6 do
7 a=1
8 done
Using >% qsub testsig -q test -l walltime=1:00:00
I get the following in the testsig.o#### stdout file:
trap -- 'echo "Singal USR2 received";date' SIGUSR2
trap -- 'echo "Singal TERM received";date' SIGTERM
Singal USR2 received
Wed Aug 30 17:18:37 BST 2006
Singal TERM received
Wed Aug 30 17:18:38 BST 2006
And the job exits. However, when running testsig from a command line,
issuing the command
kill -USR2 %1
does not cause the job to exit.
THe effect in my real job script is that signalling the job via qsig
does not allow the job to clean up its scratch files, for example,
something like this pseudoscript:
#!/bin/bash
#PBS -l walltime=1:00:00
(create a scratch dir on local disk)
(copy files to scratch dir)
(run the PROGRAM)
(copy files to the master node)
(delete the scratch dir)
Despite implementing signal handlers in PROGRAM that work correctly
ouside of Torque, signalling via qsig -s causes the job to terminate in
the middle of the (run the PROGRAM) step , receiving both USR2 and TERM
signals, and the following steps in the shell script are not executed.
Alternatly, I would like to send a signal to obtain intermediate results
"on demand" from a lengthy job but sending the signal terminates the
job, again despite signal handlers that work correctly ouside the torque
context.
What am I doing wrong or misunderstanding? Or do people recommend
another way entirely to do what I want to do?
Thanks,
Robert
More information about the torqueusers
mailing list