[torqueusers] Signalling a job with qsig -s SIGUSR2 seems to TERM as well?

Garrick Staples garrick at clusterresources.com
Wed Aug 30 20:26:34 MDT 2006


Sounds like a bug.  I'll do some testing.

On Wed, Aug 30, 2006 at 06:11:48PM +0100, Atwood, Robert C alleged:
>  
> Hi, 
> Perhaps I do not understand how to use qsig -s , but it seems pretty
> straightforward from the man page and the documentation, if I send 
> qsig -s SIGUSR2 
> Or 
> qsig -s USR2
> 
> Or 
> qsig -s 12  (on this system)
> 
> It should pass the signal SIGUSR2 to the job? It does not mention also
> sending the SIGTERM signal as well, but that seems to happen on my
> installation using version: 2.1.2-snap.200607191251 with default
> scheduler  (OS is SUSE 10 based ClusterVisionOS (CVOS) on Intel EM64T
> processors )
> Bash version is GNU bash, version 3.00.16(1)-release (x86_64-suse-linux)
> 
> 
> 
> If I run the following shell script (testsig):
> 
> 
>       1 #!/bin/bash
>       2 trap 'echo "Singal USR2 received";date' USR2
>       3 trap 'echo "Singal TERM received";date' TERM
>       4 trap -p
>       5 while [[ 1 ]]
>       6 do
>       7 a=1
>       8 done
> 
> Using >% qsub testsig -q test -l walltime=1:00:00
> 
> I get the following in the testsig.o#### stdout file:
> 
> 
> trap -- 'echo "Singal USR2 received";date' SIGUSR2
> trap -- 'echo "Singal TERM received";date' SIGTERM
> Singal USR2 received
> Wed Aug 30 17:18:37 BST 2006
> Singal TERM received
> Wed Aug 30 17:18:38 BST 2006
> 
> And the job exits. However, when  running testsig from a command line,
> issuing the command 
>  kill -USR2 %1 
> does not cause the job to exit. 
> 
> THe effect in my real job script is that signalling the job via qsig
> does not allow the job to clean up its scratch files, for example,
> something like this pseudoscript:
> 
> #!/bin/bash
> #PBS -l walltime=1:00:00
> 
> (create a scratch dir on local disk)
> (copy files to scratch dir)
> (run the PROGRAM)
> (copy files to the master node)
> (delete the scratch dir)
> 
> Despite implementing signal handlers in PROGRAM that work correctly
> ouside of Torque, signalling via qsig -s causes the job to terminate in
> the middle of the (run the PROGRAM) step , receiving both USR2 and TERM
> signals, and the following steps in the shell script are not executed. 
> Alternatly, I would like to send a signal to obtain intermediate results
> "on demand" from a lengthy job but sending the signal terminates the
> job, again despite signal handlers that work correctly ouside the torque
> context.
> 
> 
> What am I doing wrong or misunderstanding? Or do people recommend
> another way entirely to do what I want to do?
> 
> 
> 
> Thanks,
> Robert
> 
> 
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list