[torquedev] "Fixing" qsig -s USR1 and kill_delay on torque 2.5.x

Alan Wild alan at madllama.net
Sun Apr 15 19:51:10 MDT 2012


I got some more time this evening to comapre my patch against the latest
3.0.5 snapshot.  It turns out that the 2.x patch applies cleanly against
the 3.x tree.  I think the last version of the 2.x patch was eaten by the
mail server during the outtage so here it is again.


-%<---%<--CUT
HERE---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<--
diff -rN -U4 torque-3.0.5-snap.201204051313-old/src/resmom/start_exec.c
torque-3.0.5-snap.201204051313/src/resmom/start_exec.c
--- torque-3.0.5-snap.201204051313-old/src/resmom/start_exec.c  2012-04-05
14:13:16.000000000 -0500
+++ torque-3.0.5-snap.201204051313/src/resmom/start_exec.c      2012-04-15
20:42:00.000000000 -0500
@@ -2048,8 +2048,13 @@
   if (TJE->is_interactive == FALSE)
     {
     int k;

+    if (strlen(buf)+5 <= MAXPATHLEN) {
+        memmove(buf+5,buf,strlen(buf)+1);
+        strncpy(buf, "exec ", 5);
+    }
+
     /* pass name of shell script on pipe */
     /* will be stdin of shell  */

     close(TJE->pipe_script[0]);
@@ -3732,9 +3737,9 @@
       {
       arg[aindex] = malloc(
                           strlen(path_jobs) +
                           strlen(pjob->ji_qs.ji_fileprefix) +
-                          strlen(JOB_SCRIPT_SUFFIX) + 1);
+                          strlen(JOB_SCRIPT_SUFFIX) + 6);

       if (arg[aindex] == NULL)
         {
         log_err(errno,id,"cannot alloc env");
@@ -3745,9 +3750,10 @@

         return(-1);
         }

-      strcpy(arg[aindex], path_jobs);
+      strcpy(arg[aindex], "exec ");
+      strcat(arg[aindex], path_jobs);
       strcat(arg[aindex], pjob->ji_qs.ji_fileprefix);
       strcat(arg[aindex], JOB_SCRIPT_SUFFIX);

       arg[aindex + 1] = NULL;
-%<---%<--CUT
HERE---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<--


-Alan


On Wed, Mar 28, 2012 at 2:47 PM, Alan Wild <alan at madllama.net> wrote:


> We still don't have permission to install torque-4.0.0, even on our test
> systems.  However, I thought I would take a look at the source for pbs_mom
> to see how it works.   It appears, overall, very similiar to 2.5.11.  So I
> attempted to port my patch to 4.0.0
>
> This does compile, but I can't comment on whether or not it will run.  :)
>
>
> -%<---%<--CUT
> HERE---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<--
> diff -rN -U4 torque-4.0.0/src/resmom/start_exec.c
> torque-4.0.0-new/src/resmom/start_exec.c
> --- torque-4.0.0/src/resmom/start_exec.c        2012-02-21
> 17:43:51.000000000 -0600
> +++ torque-4.0.0-new/src/resmom/start_exec.c    2012-03-28
> 14:34:37.000000000 -0500
> @@ -2213,8 +2213,13 @@
>    if (TJE->is_interactive == FALSE)
>       {
>      int k;
>
> +    if (strlen(buf)+5 <= MAXPATHLEN) {
> +        memmove(buf+5,buf,strlen(buf)+1);
> +        strncpy(buf, "exec ", 5);
> +    }
> +
>      /* pass name of shell script on pipe */
>      /* will be stdin of shell  */
>
>      close(TJE->pipe_script[0]);
> @@ -3881,9 +3886,9 @@
>        {
>        arg[aindex] = calloc(1,
>                             strlen(path_jobs) +
>                            strlen(pjob->ji_qs.ji_fileprefix) +
> -                          strlen(JOB_SCRIPT_SUFFIX) + 1);
> +                          strlen(JOB_SCRIPT_SUFFIX) + 6);
>
>        if (arg[aindex] == NULL)
>          {
>          log_err(errno,id,"cannot alloc env");
> @@ -3894,9 +3899,10 @@
>
>          return(-1);
>           }
>
> -      strcpy(arg[aindex], path_jobs);
> +      strcpy(arg[aindex], "exec ");
> +      strcat(arg[aindex], path_jobs);
>        strcat(arg[aindex], pjob->ji_qs.ji_fileprefix);
>        strcat(arg[aindex], JOB_SCRIPT_SUFFIX);
>
>        arg[aindex + 1] = NULL;
> -%<---%<--CUT
> HERE---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<--
>
>
> -Alan
>
>
>  On Mon, Mar 26, 2012 at 11:15 PM, Alan Wild <alan at madllama.net> wrote:
>
>
>> Sorry, real world intruded the last couple of weeks and I haven't had a
>> chance to dive back into this.  Yes, for users building Torque with
>> SHELL_USE_ARGV == 1, you would need to modify TMomFinalizeChild().
>> However, we don't build Torque this way, so I haven't had a chance to
>> really test this.  Regardless, I took a stab at making the patch more
>> complete.
>>
>> This is against the released 2.5.11:
>>
>>
>> diff -rN -U2 torque-2.5.11/src/resmom/start_exec.c
>> torque-2.5.11-new/src/resmom/start_exec.c
>> --- torque-2.5.11/src/resmom/start_exec.c       2012-03-08
>> 15:34:57.000000000 -0600
>> +++ torque-2.5.11-new/src/resmom/start_exec.c   2012-03-26
>> 23:03:56.000000000 -0500
>> @@ -1997,4 +1997,9 @@
>>      int k;
>>
>> +    if (strlen(buf)+5 <= MAXPATHLEN) {
>> +        memmove(buf+5,buf,strlen(buf)+1);
>> +        strncpy(buf, "exec ", 5);
>> +    }
>> +
>>      /* pass name of shell script on pipe */
>>      /* will be stdin of shell  */
>> @@ -3641,5 +3646,5 @@
>>                            strlen(path_jobs) +
>>                            strlen(pjob->ji_qs.ji_fileprefix) +
>> -                          strlen(JOB_SCRIPT_SUFFIX) + 1);
>> +                          strlen(JOB_SCRIPT_SUFFIX) + 6);
>>
>>        if (arg[aindex] == NULL)
>> @@ -3654,5 +3659,6 @@
>>          }
>>
>> -      strcpy(arg[aindex], path_jobs);
>> +      strcpy(arg[aindex], "exec ");
>> +      strcat(arg[aindex], path_jobs);
>>        strcat(arg[aindex], pjob->ji_qs.ji_fileprefix);
>>        strcat(arg[aindex], JOB_SCRIPT_SUFFIX);
>>
>> I would love to know if anyone other than me has played with this patch
>> and whether or not it's looking viable.
>>
>> -Alan
>>
>>
>> On Mon, Mar 19, 2012 at 4:52 AM, <torquedev-request at supercluster.org>
>> wrote:
>> >
>> > I definitely agree that exec'ing the script is the correct way to
>> > spawn it.  I think the patch is reasonable.
>> >
>> > Would "exec " also need to be added to the shell command line in
>> > TMomFinalizeChild() in the SHELL_USE_ARGV == 1 case?
>> >
>> > Michael
>>
>> --
>> alan at madllama.net http://humbleville.blogspot.com
>>
>
>
> --
> alan at madllama.net http://humbleville.blogspot.com
>
>


-- 
alan at madllama.net http://humbleville.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120415/77e9d272/attachment.html 


More information about the torquedev mailing list