[torquedev] "Fixing" qsig -s USR1 and kill_delay on torque 2.5.x

Alan Wild alan at madllama.net
Tue Mar 13 08:53:52 MDT 2012

(wipes egg off face)... if it isn't obvious.. I haven't had to do any heavy
C-coding in a few months.

I wrote the loop in the previous pass knowing fully that strcpy() doesn't
support overlapping memory regions.  All of a sudden last night... I
recalled memmove() did.  Here's an updated patch against the latest 2.5.11
snapshot using memmove() and no loop.

Yes.. and I also realize I screwed up the diff.  That's what I get trying
to hand-tune it to not have it bounded by whatspace.  This one should apply
cleanly with  patch -p1


diff -rN -U2 torque-2.5.11-snap.201203081434-old/src/resmom/start_exec.c
--- torque-2.5.11-snap.201203081434-old/src/resmom/start_exec.c 2012-03-08
15:34:57.000000000 -0600
+++ torque-2.5.11-snap.201203081434/src/resmom/start_exec.c     2012-03-13
09:47:55.000000000 -0500
@@ -1997,4 +1997,9 @@
     int k;
 +    if (strlen(buf)+5 <= MAXPATHLEN) {
+        memmove(buf+5,buf,strlen(buf)+1);
+        strncpy(buf, "exec ", 5);
+    }
     /* pass name of shell script on pipe */
     /* will be stdin of shell  */

On Mon, Mar 12, 2012 at 3:38 PM, Alan Wild <alan at madllama.net> wrote:

> NOTE:  we are presently running 2.5.7, but I've confirmed that this change
> is still applicable to 2.5.9.  I've not had a chance to look at 3.x or 4.x
> in any way.
> We recently wanted to change our kill_delay on our system to allow jobs
> adequate time to properly clean up in the event of a qdel.  At the same
> time I started playing with qsig and discovered that sending a USR1 signal
> to process would cause it to terminate (even if the jobscript/job properly
> handled SIGUSR1).
> It tuns out that both issues are related to the same problem: The failure
> of the user's shell (by default) to catch and properly handle signals.
> This has been discussed here (and on torqueusers) several times in the past
> and the general recommendation has always been to have the user add the
> necessary "trap" statements to their .bashrc (or appropriate file) in
> addition to putting them in their job script.
> The reasons for these recommendations stems from the process hierarchy
> that is created by pbs_mom:
> pbs_mom,6488 -p
>   `-bash,10919
>      `-16398.hpdjsl001,10978 -l /var/spool/pbs/mom_priv/jobs/
> 16398.hpdjsl001.SC <http://16398.hpdjsl001.sc/>
> pbs_mom launches a shell (in my case bash) which, in turn, invokes the job
> script.  When the user executes a qsig or qdel... the server passes the
> signal to the mom and the mom signals both of these processes.  If the job
> script has the necessary trap calls in it... it, of course, handles the
> signal properly, but the shell process will exit... and many shells will
> exit even on on a seemingly innocuous SIGUSR1.
> If the shell process exits... the pbs_mom believes the job to have died
> and automatically enters into a mode where it sends a SIGTERM to the
> jobscript and ~5seconds later a SIGKILL.  This happens whether regardless
> of the singal the user sent (even SIGUSR1) or in the event of a qdel.
> However, given that the goal of a qdel is to remove a job... most Torque
> users are probably none the wiser that it isn't going through the "correct"
> termination sequence.
> We have a large user community (and most are not technical enough) that I
> don't reasonably expect them to be able to properly implement the changes
> to their individual login files.  I've considering having our system
> configuration files updated, but this would affect all users (even those
> that don't submit jobs) and I we would be stuck maintaining a solution that
> works for each of about five different shells we have installed.
> So I wondered if there couldn't be a better way.
> I looked at the pbs_mom source and found how the pbs_mom passes the script
> command to invoked into the shell process.  It does so via a pipe which is
> connected to the shell's stdin. So I thought, "why couldn't the shell
> simply 'exec' the job script instead of running it as a simple command
> line?"  It turns out that the pipe is closed shortly after the script's
> path is passed to the shell so it's not like pbs_mom was going to talk to
> the shell anymore... so why leave the shell running?  If the shell is no
> longer running... that's one less process to have worry about catching
> signals... and potentially it's less memory wasted on the compute node.
> I threw together this rather small patch as a prototype:
> diff -urN torque-2.5.7/src/resmom/start_exec.c
> torque-2.5.7-new/src/resmom/start_exec.c
> --- torque-2.5.7/src/resmom/start_exec.c        2011-06-17
> 17:15:57.000000000 -0500
> +++ torque-2.5.7-new/src/resmom/start_exec.c    2012-03-12
> 13:29:13.000000000 -0500
> @@ -1966,5 +1966,11 @@
>                 {
>                 int k;
> +               if (strlen(buf)+5 <= MAXPATHLEN) {
> +                       for (i=strlen(buf); i>=0; i--)
> +                               buf[i+5] = buf[i];
> +                       strncpy(buf, "exec ", 5);
> +               }
> +
>                 /* pass name of shell script on pipe */
>                 /* will be stdin of shell  */
> ...And found it to work as expected in our test environment (with
> admittedly limited testing).  All this does, (if there is still space in
> the buffer) is shifts everything over 5 characters and inserts "exec " at
> the beginning of the command line. The shell invokes the process, which of
> course, now exec's the script. The script inherits the pid of the shell as
> well as its stdin/stdout/stderr so pbs_demux appears to function correctly.
> Every shell I've investigated (sh, csh, ksh, bash, zsh) all appear to
> honor the "exec" command in the same manner so this appears to be a viable
> solution to this problem (premature shell termination) without requiring
> users (or admins) to add "trap" statements to dotfiles to protect that one
> process.  For the record, this doesn't get anyone off the hook about
> installing trap's in the job scripts (or signal handlers in the processes
> themselves), but this appears to remove one of larger barriers in
> leveraging qsig(1) and extended kill_delay settings.
> I'lll concede there could be a flaw in my logic, and as I stated above,
> this has only had limited testing thus far, but I would love to hear what I
> may have missed and why this couldn't be a viable change in Torque.
> This was tested by qsub'ing the following perl script directly (no shell
> job-script around it).  This code simply catches signals, prints the time
> that they were received, and after the first signal is caught... prints the
> time in 1 second intervals (since you'll never see the final SIGKILL you
> can at least count of the seconds).
> #!/usr/bin/perl -l
> my $stop;
> $|=1;
>  @SIG{(CATCH)} = (sub { $stop||=1; print join ' ', shift, '@', scalar
> localtime }) x CATCH;
>  sleep unless $stop;
> print (scalar localtime), sleep 1 while 1;
> When tested with a qdel, you'll see a TERM signal logged at the time
> invocation, followed by the number of printouts which correspond with your
> kill_delay setting (defaults to 2 seconds).  Finally you see a second
> SIGTERM and then ~5 seconds later the output stops (because the process
> receives a SIGKILL).  For the unfamiliar, when the server asks a mom to do
> a SIGKILL... it is hard coded to SIGTERM first and then ~5 seconds later to
> try a SIGKILL.
> Without my patch above (and without adding trap statements to your
> .bashrc) this script will output two SIGTERM's (typically within the same
> second) with about 5 more seconds of printouts (before the final kill).
> mom_logs will confirm that the initial SIGTERM terminated the shell
> process, and that the mom then automatically initiated a job termination
> (via the second TERM and KILL).
> I also won't take any offense if someone wants to implement the patch more
> efficiently, I was just trying to do what I wanted with the minimal amount
> of change to the torque code.
> Thanks,
> -Alan
> --
> alan at madllama.net http://humbleville.blogspot.com

alan at madllama.net http://humbleville.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120313/1f86db4f/attachment.html 

More information about the torquedev mailing list