[torquedev] "Fixing" qsig -s USR1 and kill_delay on torque 2.5.x

Ken Nielson knielson at adaptivecomputing.com
Tue Mar 13 10:24:26 MDT 2012


Alan,

I'm sorry I did not get to this sooner. I would like add this and test it.
We are up against a deadline to get torque 2.5.11 out. I would like to put
this in for 2.5.12 and get some testing for it. We can do a 2.5.12 release
when we feel confident in this fix.

Regards

Ken

On Mon, Mar 12, 2012 at 2:38 PM, Alan Wild <alan at madllama.net> wrote:

> NOTE:  we are presently running 2.5.7, but I've confirmed that this change
> is still applicable to 2.5.9.  I've not had a chance to look at 3.x or 4.x
> in any way.
>
> We recently wanted to change our kill_delay on our system to allow jobs
> adequate time to properly clean up in the event of a qdel.  At the same
> time I started playing with qsig and discovered that sending a USR1 signal
> to process would cause it to terminate (even if the jobscript/job properly
> handled SIGUSR1).
>
> It tuns out that both issues are related to the same problem: The failure
> of the user's shell (by default) to catch and properly handle signals.
> This has been discussed here (and on torqueusers) several times in the past
> and the general recommendation has always been to have the user add the
> necessary "trap" statements to their .bashrc (or appropriate file) in
> addition to putting them in their job script.
>
> The reasons for these recommendations stems from the process hierarchy
> that is created by pbs_mom:
>
> pbs_mom,6488 -p
>   `-bash,10919
>      `-16398.hpdjsl001,10978 -l /var/spool/pbs/mom_priv/jobs/
> 16398.hpdjsl001.SC
>
> pbs_mom launches a shell (in my case bash) which, in turn, invokes the job
> script.  When the user executes a qsig or qdel... the server passes the
> signal to the mom and the mom signals both of these processes.  If the job
> script has the necessary trap calls in it... it, of course, handles the
> signal properly, but the shell process will exit... and many shells will
> exit even on on a seemingly innocuous SIGUSR1.
>
> If the shell process exits... the pbs_mom believes the job to have died
> and automatically enters into a mode where it sends a SIGTERM to the
> jobscript and ~5seconds later a SIGKILL.  This happens whether regardless
> of the singal the user sent (even SIGUSR1) or in the event of a qdel.
> However, given that the goal of a qdel is to remove a job... most Torque
> users are probably none the wiser that it isn't going through the "correct"
> termination sequence.
>
> We have a large user community (and most are not technical enough) that I
> don't reasonably expect them to be able to properly implement the changes
> to their individual login files.  I've considering having our system
> configuration files updated, but this would affect all users (even those
> that don't submit jobs) and I we would be stuck maintaining a solution that
> works for each of about five different shells we have installed.
>
> So I wondered if there couldn't be a better way.
>
> I looked at the pbs_mom source and found how the pbs_mom passes the script
> command to invoked into the shell process.  It does so via a pipe which is
> connected to the shell's stdin. So I thought, "why couldn't the shell
> simply 'exec' the job script instead of running it as a simple command
> line?"  It turns out that the pipe is closed shortly after the script's
> path is passed to the shell so it's not like pbs_mom was going to talk to
> the shell anymore... so why leave the shell running?  If the shell is no
> longer running... that's one less process to have worry about catching
> signals... and potentially it's less memory wasted on the compute node.
>
> I threw together this rather small patch as a prototype:
>
> diff -urN torque-2.5.7/src/resmom/start_exec.c
> torque-2.5.7-new/src/resmom/start_exec.c
> --- torque-2.5.7/src/resmom/start_exec.c        2011-06-17
> 17:15:57.000000000 -0500
> +++ torque-2.5.7-new/src/resmom/start_exec.c    2012-03-12
> 13:29:13.000000000 -0500
> @@ -1966,5 +1966,11 @@
>                 {
>                 int k;
>
> +               if (strlen(buf)+5 <= MAXPATHLEN) {
> +                       for (i=strlen(buf); i>=0; i--)
> +                               buf[i+5] = buf[i];
> +                       strncpy(buf, "exec ", 5);
> +               }
> +
>                 /* pass name of shell script on pipe */
>                 /* will be stdin of shell  */
>
> ...And found it to work as expected in our test environment (with
> admittedly limited testing).  All this does, (if there is still space in
> the buffer) is shifts everything over 5 characters and inserts "exec " at
> the beginning of the command line. The shell invokes the process, which of
> course, now exec's the script. The script inherits the pid of the shell as
> well as its stdin/stdout/stderr so pbs_demux appears to function correctly.
>
> Every shell I've investigated (sh, csh, ksh, bash, zsh) all appear to
> honor the "exec" command in the same manner so this appears to be a viable
> solution to this problem (premature shell termination) without requiring
> users (or admins) to add "trap" statements to dotfiles to protect that one
> process.  For the record, this doesn't get anyone off the hook about
> installing trap's in the job scripts (or signal handlers in the processes
> themselves), but this appears to remove one of larger barriers in
> leveraging qsig(1) and extended kill_delay settings.
>
> I'lll concede there could be a flaw in my logic, and as I stated above,
> this has only had limited testing thus far, but I would love to hear what I
> may have missed and why this couldn't be a viable change in Torque.
>
> This was tested by qsub'ing the following perl script directly (no shell
> job-script around it).  This code simply catches signals, prints the time
> that they were received, and after the first signal is caught... prints the
> time in 1 second intervals (since you'll never see the final SIGKILL you
> can at least count of the seconds).
>
> #!/usr/bin/perl -l
> use constant CATCH => qw/USR1 USR2 HUP TERM INT QUIT ABRT ILL FPE SEGV
> ALRM PIPE CHLD/;
> my $stop;
> $|=1;
>
> @SIG{(CATCH)} = (sub { $stop||=1; print join ' ', shift, '@', scalar
> localtime }) x CATCH;
>
> sleep unless $stop;
> print (scalar localtime), sleep 1 while 1;
>
> When tested with a qdel, you'll see a TERM signal logged at the time
> invocation, followed by the number of printouts which correspond with your
> kill_delay setting (defaults to 2 seconds).  Finally you see a second
> SIGTERM and then ~5 seconds later the output stops (because the process
> receives a SIGKILL).  For the unfamiliar, when the server asks a mom to do
> a SIGKILL... it is hard coded to SIGTERM first and then ~5 seconds later to
> try a SIGKILL.
>
> Without my patch above (and without adding trap statements to your
> .bashrc) this script will output two SIGTERM's (typically within the same
> second) with about 5 more seconds of printouts (before the final kill).
> mom_logs will confirm that the initial SIGTERM terminated the shell
> process, and that the mom then automatically initiated a job termination
> (via the second TERM and KILL).
>
> I also won't take any offense if someone wants to implement the patch more
> efficiently, I was just trying to do what I wanted with the minimal amount
> of change to the torque code.
>
> Thanks,
>
> -Alan
>
> --
> alan at madllama.net http://humbleville.blogspot.com
>
>
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120313/22ab13b4/attachment-0001.html 


More information about the torquedev mailing list