[torqueusers] SIGUSR1 results in a SIGTERM

Jeremy D Rogers jdrogers at northwestern.edu
Tue Mar 22 06:49:47 MDT 2011


Hi all,
I've been digging through the mailing list and docs, but I'm stumped.
I'm trying to have my program write data and exit cleanly on receipt
of SIGUSR1 (or any other signal for that matter).

The program works as expected when run with mpirun, but when using the
queue the job is killed with signal 15 right after receiving signal 10
(or 12). This is true of a small cluster of mine running torque2.4.1
as well as our university system running moab and I _think_
torque249.. still trying to figure out how to tell definitively as a
user.

>From the docs, I gather that the queue manager should pass along
signals by default, and it appears to be:
$ qsub -l nodes=2:ppn=2 qsubmit.sh
4340.biophotonics1.bp1.loc
$ qsig -s SIGUSR1 4340.biophotonics1.bp1.loc
$ cat montecarlo.o4340
mpirun: Forwarding signal 10 to job
Caught SIGNAL 10 on proc 0, exiting..
Caught SIGNAL 10 on proc 0, exiting..
Caught SIGNAL 10 on proc 2, exiting..
Caught SIGNAL 10 on proc 3, exiting..
mpirun: killing job...
Caught SIGNAL 15 on proc 0, exiting..
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 31855 on node bp1n2 exited
on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
Caught SIGNAL 15 on proc 2, exiting..
Caught SIGNAL 15 on proc 1, exiting..
Caught SIGNAL 10 on proc 1, exiting..
Caught SIGNAL 15 on proc 3, exiting..
mpirun: clean termination accomplished
4 total processes killed (some possibly by mpirun during cleanup)

It appears that signal 10 is being forwarded properly and my program
catches it and begins to exit, but then the server sends a SIGTERM
which kills everything before my jobs can finish writing their data.

Any suggestions on how to debug this would be appreciated.
Thanks,
JDR

--
Jeremy D. Rogers, Ph.D.
Postdoctoral Fellow
Biomedical Engineering
Northwestern University


More information about the torqueusers mailing list