[torquedev] torquedev Digest, Vol 76, Issue 5

Alan Wild alan at madllama.net
Tue Mar 13 13:16:47 MDT 2012


I'm waiting to see if the community can find a hole in my logic.  I'll have
to admit that I'm relatively new to pbs and I realize there are a lot of
things done for historical (or platform compatibility reasons) I don't
appreciate.

One issue I see.. what happens if the user' shell accepts:

/path/to/some/job/script.SC

which Torque does today, but not

exec /path/tp/some/job/script.SC

.. which is what I'm proposing.  I checked the shells that I have on hand
(sh, ksh, bash, csh, tcsh, zsh) and all of these appear fine, but I'm only
checking on RHEL Linux and I fully appreciate that other OS's sh and csh
implementations can be quite different (but I would be quite surprised if
they can't exec).  While this covers the 95% case, there could be some
esoteric shell out there that proves incompatible.

That said, I find myself wondering why Torque uses the users's shell
doesn't just use /bin/sh (which is pretty much going to exist on any Unix
machine out there).  I'm guessing that there could be some subtle things in
the user's environment that make the job script work under their shell that
could fail if invoked under /bin/sh (assuming they aren't an sh-user)...
but at least Torque would be gauranteed of the shell's behavior (and it
would gaurantee that the above "exec" issue couldn't happen).  Personally,
I'm tempted to suggest that Torque should have a --always-use-sh compile
time option :)

Of course, the environment problem I'm describing should be quite familiar
to anyone that uses cron and people have worked around that for years
(decades?) so I would think some sites could live with that option.

Again, I'm relatively new and there may be some subtle interaction between
pbs_mom, job_scripts and pbs_demux I'm not picking up on, but in the
testing I've been able to do thus far... it works fine for me.

-Alan

On Tue, Mar 13, 2012 at 11:24 AM, <torquedev-request at supercluster.org>wrote:
----------------------------------------------------------------------

Message: 1
Date: Tue, 13 Mar 2012 10:04:41 -0600
From: David Beer <dbeer at adaptivecomputing.com>
Subject: Re: [torquedev] "Fixing" qsig -s USR1 and kill_delay on
       torque 2.5.x
To: Torque Developers mailing list <torquedev at supercluster.org>
Message-ID:
       <CAFUQeZ1bpV-uzgm-qgQvAZnDZVY5J3bPvHSnkPOh-La4_hP+3g at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Having done the work it takes to configure these signals to work in the
system's current state, I'm all for addressing this issue. I'm wondering if
there are any community concerns about this change? Do you see any possible
regressions? What are the risks? Should we make this change something that
only happens if you turn it on in the mom config file? In some ways I like
this option because it is easy to turn off if there are regressions, but on
the other hand the kill_delay functionality is so cumbersome to set up its
essentially broken now. I'm interested to hear the community's input on
this patch.

David

On Tue, Mar 13, 2012 at 8:53 AM, Alan Wild <alan at madllama.net> wrote:

> (wipes egg off face)... if it isn't obvious.. I haven't had to do any
> heavy C-coding in a few months.
>
> I wrote the loop in the previous pass knowing fully that strcpy() doesn't
> support overlapping memory regions.  All of a sudden last night... I
> recalled memmove() did.  Here's an updated patch against the latest 2.5.11
> snapshot using memmove() and no loop.
>
> Yes.. and I also realize I screwed up the diff.  That's what I get trying
> to hand-tune it to not have it bounded by whatspace.  This one should
apply
> cleanly with  patch -p1
>
> -Alan
>
> diff -rN -U2 torque-2.5.11-snap.201203081434-old/src/resmom/start_exec.c
> torque-2.5.11-snap.201203081434/src/resmom/start_exec.c
> --- torque-2.5.11-snap.201203081434-old/src/resmom/start_exec.c 2012-03-08
> 15:34:57.000000000 -0600
> +++ torque-2.5.11-snap.201203081434/src/resmom/start_exec.c     2012-03-13
> 09:47:55.000000000 -0500
> @@ -1997,4 +1997,9 @@
>      int k;
>  +    if (strlen(buf)+5 <= MAXPATHLEN) {
> +        memmove(buf+5,buf,strlen(buf)+1);
> +        strncpy(buf, "exec ", 5);
> +    }
> +
>      /* pass name of shell script on pipe */
>      /* will be stdin of shell  */
> On Mon, Mar 12, 2012 at 3:38 PM, Alan Wild <alan at madllama.net> wrote:
>
>
>> NOTE:  we are presently running 2.5.7, but I've confirmed that this
>> change is still applicable to 2.5.9.  I've not had a chance to look at
3.x
>> or 4.x in any way.
>>
>> We recently wanted to change our kill_delay on our system to allow jobs
>> adequate time to properly clean up in the event of a qdel.  At the same
>> time I started playing with qsig and discovered that sending a USR1
signal
>> to process would cause it to terminate (even if the jobscript/job
properly
>> handled SIGUSR1).
>>
>> It tuns out that both issues are related to the same problem: The failure
>> of the user's shell (by default) to catch and properly handle signals.
>> This has been discussed here (and on torqueusers) several times in the
past
>> and the general recommendation has always been to have the user add the
>> necessary "trap" statements to their .bashrc (or appropriate file) in
>> addition to putting them in their job script.
>>
>> The reasons for these recommendations stems from the process hierarchy
>> that is created by pbs_mom:
>>
>> pbs_mom,6488 -p
>>   `-bash,10919
>>      `-16398.hpdjsl001,10978 -l /var/spool/pbs/mom_priv/jobs/
>> 16398.hpdjsl001.SC <http://16398.hpdjsl001.sc/> <
http://16398.hpdjsl001.sc/>
>>
>> pbs_mom launches a shell (in my case bash) which, in turn, invokes the
>> job script.  When the user executes a qsig or qdel... the server passes
the
>> signal to the mom and the mom signals both of these processes.  If the
job
>> script has the necessary trap calls in it... it, of course, handles the
>> signal properly, but the shell process will exit... and many shells will
>> exit even on on a seemingly innocuous SIGUSR1.
>>
>> If the shell process exits... the pbs_mom believes the job to have died
>> and automatically enters into a mode where it sends a SIGTERM to the
>> jobscript and ~5seconds later a SIGKILL.  This happens whether regardless
>> of the singal the user sent (even SIGUSR1) or in the event of a qdel.
>> However, given that the goal of a qdel is to remove a job... most Torque
>> users are probably none the wiser that it isn't going through the
"correct"
>> termination sequence.
>>
>> We have a large user community (and most are not technical enough) that I
>> don't reasonably expect them to be able to properly implement the changes
>> to their individual login files.  I've considering having our system
>> configuration files updated, but this would affect all users (even those
>> that don't submit jobs) and I we would be stuck maintaining a solution
that
>> works for each of about five different shells we have installed.
>>
>> So I wondered if there couldn't be a better way.
>>
>> I looked at the pbs_mom source and found how the pbs_mom passes the
>> script command to invoked into the shell process.  It does so via a pipe
>> which is connected to the shell's stdin. So I thought, "why couldn't the
>> shell simply 'exec' the job script instead of running it as a simple
>> command line?"  It turns out that the pipe is closed shortly after the
>> script's path is passed to the shell so it's not like pbs_mom was going
to
>> talk to the shell anymore... so why leave the shell running?  If the
shell
>> is no longer running... that's one less process to have worry about
>> catching signals... and potentially it's less memory wasted on the
compute
>> node.
>>
>> I threw together this rather small patch as a prototype:
>>
>> diff -urN torque-2.5.7/src/resmom/start_exec.c
>> torque-2.5.7-new/src/resmom/start_exec.c
>> --- torque-2.5.7/src/resmom/start_exec.c        2011-06-17
>> 17:15:57.000000000 -0500
>> +++ torque-2.5.7-new/src/resmom/start_exec.c    2012-03-12
>> 13:29:13.000000000 -0500
>> @@ -1966,5 +1966,11 @@
>>                 {
>>                 int k;
>>
>> +               if (strlen(buf)+5 <= MAXPATHLEN) {
>> +                       for (i=strlen(buf); i>=0; i--)
>> +                               buf[i+5] = buf[i];
>> +                       strncpy(buf, "exec ", 5);
>> +               }
>> +
>>                 /* pass name of shell script on pipe */
>>                 /* will be stdin of shell  */
>>
>> ...And found it to work as expected in our test environment (with
>> admittedly limited testing).  All this does, (if there is still space in
>> the buffer) is shifts everything over 5 characters and inserts "exec " at
>> the beginning of the command line. The shell invokes the process, which
of
>> course, now exec's the script. The script inherits the pid of the shell
as
>> well as its stdin/stdout/stderr so pbs_demux appears to function
correctly.
>>
>> Every shell I've investigated (sh, csh, ksh, bash, zsh) all appear to
>> honor the "exec" command in the same manner so this appears to be a
viable
>> solution to this problem (premature shell termination) without requiring
>> users (or admins) to add "trap" statements to dotfiles to protect that
one
>> process.  For the record, this doesn't get anyone off the hook about
>> installing trap's in the job scripts (or signal handlers in the processes
>> themselves), but this appears to remove one of larger barriers in
>> leveraging qsig(1) and extended kill_delay settings.
>>
>> I'lll concede there could be a flaw in my logic, and as I stated above,
>> this has only had limited testing thus far, but I would love to hear
what I
>> may have missed and why this couldn't be a viable change in Torque.
>>
>> This was tested by qsub'ing the following perl script directly (no shell
>> job-script around it).  This code simply catches signals, prints the time
>> that they were received, and after the first signal is caught... prints
the
>> time in 1 second intervals (since you'll never see the final SIGKILL you
>> can at least count of the seconds).
>>
>> #!/usr/bin/perl -l
>> use constant CATCH => qw/USR1 USR2 HUP TERM INT QUIT ABRT ILL FPE SEGV
>> ALRM PIPE CHLD/;
>> my $stop;
>> $|=1;
>>
>>  @SIG{(CATCH)} = (sub { $stop||=1; print join ' ', shift, '@', scalar
> localtime }) x CATCH;
>  sleep unless $stop;
> print (scalar localtime), sleep 1 while 1;
> When tested with a qdel, you'll see a TERM signal logged at the time
> invocation, followed by the number of printouts which correspond with your
> kill_delay setting (defaults to 2 seconds).  Finally you see a second
> SIGTERM and then ~5 seconds later the output stops (because the process
> receives a SIGKILL).  For the unfamiliar, when the server asks a mom to do
> a SIGKILL... it is hard coded to SIGTERM first and then ~5 seconds later
to
> try a SIGKILL.
>
> Without my patch above (and without adding trap statements to your
> .bashrc) this script will output two SIGTERM's (typically within the same
> second) with about 5 more seconds of printouts (before the final kill).
> mom_logs will confirm that the initial SIGTERM terminated the shell
> process, and that the mom then automatically initiated a job termination
> (via the second TERM and KILL).
>
> I also won't take any offense if someone wants to implement the patch more
> efficiently, I was just trying to do what I wanted with the minimal amount
> of change to the torque code.
>
> Thanks,
>
> -Alan
>
> --
> alan at madllama.net http://humbleville.blogspot.com
>
>
>
>
> --
> alan at madllama.net http://humbleville.blogspot.com
>
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>


--
David Beer | Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torquedev/attachments/20120313/68bdfe0f/attachment-0001.html

------------------------------

Message: 2
Date: Tue, 13 Mar 2012 10:07:36 -0600 (MDT)
From: bugzilla-daemon at supercluster.org
Subject: [torquedev] [Bug 168] 2.5(.9) qsub does not seem to accept
       comma seperated -W argument
To: torquedev at supercluster.org
Message-ID: <20120313160736.7E6264121046 at http.supercluster.org>
Content-Type: text/plain; charset="UTF-8"

http://www.clusterresources.com/bugzilla/show_bug.cgi?id=168

Ken Nielson <knielson at adaptivecomputing.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
            Status|NEW                         |RESOLVED
        Resolution|                            |FIXED

--- Comment #6 from Ken Nielson <knielson at adaptivecomputing.com> 2012-03-13
10:07:36 MDT ---
Fixed in 2.5.11 revision 5803

--
Configure bugmail:
http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


------------------------------

Message: 3
Date: Tue, 13 Mar 2012 10:24:26 -0600
From: Ken Nielson <knielson at adaptivecomputing.com>
Subject: Re: [torquedev] "Fixing" qsig -s USR1 and kill_delay on
       torque 2.5.x
To: Torque Developers mailing list <torquedev at supercluster.org>
Message-ID:
       <CADvLK3dhjWYNqjYgficcs9EEuey+KW7h=9HBvgq9axg9tDx8FA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Alan,

I'm sorry I did not get to this sooner. I would like add this and test it.
We are up against a deadline to get torque 2.5.11 out. I would like to put
this in for 2.5.12 and get some testing for it. We can do a 2.5.12 release
when we feel confident in this fix.

Regards

Ken

On Mon, Mar 12, 2012 at 2:38 PM, Alan Wild <alan at madllama.net> wrote:

> NOTE:  we are presently running 2.5.7, but I've confirmed that this change
> is still applicable to 2.5.9.  I've not had a chance to look at 3.x or 4.x
> in any way.
>
> We recently wanted to change our kill_delay on our system to allow jobs
> adequate time to properly clean up in the event of a qdel.  At the same
> time I started playing with qsig and discovered that sending a USR1 signal
> to process would cause it to terminate (even if the jobscript/job properly
> handled SIGUSR1).
>
> It tuns out that both issues are related to the same problem: The failure
> of the user's shell (by default) to catch and properly handle signals.
> This has been discussed here (and on torqueusers) several times in the
past
> and the general recommendation has always been to have the user add the
> necessary "trap" statements to their .bashrc (or appropriate file) in
> addition to putting them in their job script.
>
> The reasons for these recommendations stems from the process hierarchy
> that is created by pbs_mom:
>
> pbs_mom,6488 -p
>   `-bash,10919
>      `-16398.hpdjsl001,10978 -l /var/spool/pbs/mom_priv/jobs/
> 16398.hpdjsl001.SC <http://16398.hpdjsl001.sc/>
>
> pbs_mom launches a shell (in my case bash) which, in turn, invokes the job
> script.  When the user executes a qsig or qdel... the server passes the
> signal to the mom and the mom signals both of these processes.  If the job
> script has the necessary trap calls in it... it, of course, handles the
> signal properly, but the shell process will exit... and many shells will
> exit even on on a seemingly innocuous SIGUSR1.
>
> If the shell process exits... the pbs_mom believes the job to have died
> and automatically enters into a mode where it sends a SIGTERM to the
> jobscript and ~5seconds later a SIGKILL.  This happens whether regardless
> of the singal the user sent (even SIGUSR1) or in the event of a qdel.
> However, given that the goal of a qdel is to remove a job... most Torque
> users are probably none the wiser that it isn't going through the
"correct"
> termination sequence.
>
> We have a large user community (and most are not technical enough) that I
> don't reasonably expect them to be able to properly implement the changes
> to their individual login files.  I've considering having our system
> configuration files updated, but this would affect all users (even those
> that don't submit jobs) and I we would be stuck maintaining a solution
that
> works for each of about five different shells we have installed.
>
> So I wondered if there couldn't be a better way.
>
> I looked at the pbs_mom source and found how the pbs_mom passes the script
> command to invoked into the shell process.  It does so via a pipe which is
> connected to the shell's stdin. So I thought, "why couldn't the shell
> simply 'exec' the job script instead of running it as a simple command
> line?"  It turns out that the pipe is closed shortly after the script's
> path is passed to the shell so it's not like pbs_mom was going to talk to
> the shell anymore... so why leave the shell running?  If the shell is no
> longer running... that's one less process to have worry about catching
> signals... and potentially it's less memory wasted on the compute node.
>
> I threw together this rather small patch as a prototype:
>
> diff -urN torque-2.5.7/src/resmom/start_exec.c
> torque-2.5.7-new/src/resmom/start_exec.c
> --- torque-2.5.7/src/resmom/start_exec.c        2011-06-17
> 17:15:57.000000000 -0500
> +++ torque-2.5.7-new/src/resmom/start_exec.c    2012-03-12
> 13:29:13.000000000 -0500
> @@ -1966,5 +1966,11 @@
>                 {
>                 int k;
>
> +               if (strlen(buf)+5 <= MAXPATHLEN) {
> +                       for (i=strlen(buf); i>=0; i--)
> +                               buf[i+5] = buf[i];
> +                       strncpy(buf, "exec ", 5);
> +               }
> +
>                 /* pass name of shell script on pipe */
>                 /* will be stdin of shell  */
>
> ...And found it to work as expected in our test environment (with
> admittedly limited testing).  All this does, (if there is still space in
> the buffer) is shifts everything over 5 characters and inserts "exec " at
> the beginning of the command line. The shell invokes the process, which of
> course, now exec's the script. The script inherits the pid of the shell as
> well as its stdin/stdout/stderr so pbs_demux appears to function
correctly.
>
> Every shell I've investigated (sh, csh, ksh, bash, zsh) all appear to
> honor the "exec" command in the same manner so this appears to be a viable
> solution to this problem (premature shell termination) without requiring
> users (or admins) to add "trap" statements to dotfiles to protect that one
> process.  For the record, this doesn't get anyone off the hook about
> installing trap's in the job scripts (or signal handlers in the processes
> themselves), but this appears to remove one of larger barriers in
> leveraging qsig(1) and extended kill_delay settings.
>
> I'lll concede there could be a flaw in my logic, and as I stated above,
> this has only had limited testing thus far, but I would love to hear what
I
> may have missed and why this couldn't be a viable change in Torque.
>
> This was tested by qsub'ing the following perl script directly (no shell
> job-script around it).  This code simply catches signals, prints the time
> that they were received, and after the first signal is caught... prints
the
> time in 1 second intervals (since you'll never see the final SIGKILL you
> can at least count of the seconds).
>
> #!/usr/bin/perl -l
> use constant CATCH => qw/USR1 USR2 HUP TERM INT QUIT ABRT ILL FPE SEGV
> ALRM PIPE CHLD/;
> my $stop;
> $|=1;
>
> @SIG{(CATCH)} = (sub { $stop||=1; print join ' ', shift, '@', scalar
> localtime }) x CATCH;
>
> sleep unless $stop;
> print (scalar localtime), sleep 1 while 1;
>
> When tested with a qdel, you'll see a TERM signal logged at the time
> invocation, followed by the number of printouts which correspond with your
> kill_delay setting (defaults to 2 seconds).  Finally you see a second
> SIGTERM and then ~5 seconds later the output stops (because the process
> receives a SIGKILL).  For the unfamiliar, when the server asks a mom to do
> a SIGKILL... it is hard coded to SIGTERM first and then ~5 seconds later
to
> try a SIGKILL.
>
> Without my patch above (and without adding trap statements to your
> .bashrc) this script will output two SIGTERM's (typically within the same
> second) with about 5 more seconds of printouts (before the final kill).
> mom_logs will confirm that the initial SIGTERM terminated the shell
> process, and that the mom then automatically initiated a job termination
> (via the second TERM and KILL).
>
> I also won't take any offense if someone wants to implement the patch more
> efficiently, I was just trying to do what I wanted with the minimal amount
> of change to the torque code.
>
> Thanks,
>
> -Alan
>
> --
> alan at madllama.net http://humbleville.blogspot.com
>
>
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torquedev/attachments/20120313/22ab13b4/attachment.html

------------------------------

_______________________________________________
torquedev mailing list
torquedev at supercluster.org
http://www.supercluster.org/mailman/listinfo/torquedev


End of torquedev Digest, Vol 76, Issue 5
****************************************



-- 
alan at madllama.net http://humbleville.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120313/67b1d77f/attachment-0001.html 


More information about the torquedev mailing list