[torqueusers] Re: mom segfault in new diag code

Dave Jackson jacksond at supercluster.org
Thu Oct 28 17:09:08 MDT 2004


Garrick,

  Our fault, we pushed out a patch which contained the bounds checking
but this patch failed to get updated on the web.  The new code should
perform all required tlist bounds checking.

Dave

On Thu, 2004-10-28 at 14:07, Garrick Staples wrote:
> tlist() isn't checking the bounds of Buf correctly.  As it recurses, BufSize is
> never recalculated.  
> 
> This seems to work correctly (but you might want only one strlen())...
> 
> 
> diff -ruN torque-1.1.0p4_orig/src/resmom/mom_server.c torque-1.1.0p4/src/resmom/mom_server.c
> --- torque-1.1.0p4_orig/src/resmom/mom_server.c	2004-10-25 13:11:01.000000000 -0700
> +++ torque-1.1.0p4/src/resmom/mom_server.c	2004-10-28 13:01:48.000000000 -0700
> @@ -189,10 +189,10 @@
>  
>    if (Buf[0] != '\0')
>      {
> -    strncat(Buf,",",BufSize);
> +    strncat(Buf,",",BufSize-strlen(Buf));
>      }
>  
> -  strncat(Buf,tmpLine,BufSize);
> +  strncat(Buf,tmpLine,BufSize-strlen(Buf));
>    
>    return;
>    }  /* END tlist() */
> 
> 
> On Thu, Oct 28, 2004 at 12:40:02PM -0700, Garrick Staples alleged:
> > torque-1.1.0p4-snap.1098735063 isn't segfaulting, but it still isn't right...
> > 
> > $ momctl -d 0 -h hpc1201
> > 
> > Host: hpc1201/hpc1201.usc.edu   Server: hpc-master   Version: torque_1.1.0p4
> > HomeDirectory:          /var/spool/torque/mom_priv
> > MOM active:             58 seconds
> > Last Msg From Server:   58 seconds (CLUSTER_ADDRS)
> > Last Msg To Server:     13 seconds
> > LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
> > JobList:                NONE
> > 
> > diagnostics complete
> > 
> > [ucs at hpc-master /root]$ momctl -d 1 -h hpc1201
> > .125.1.76,10.125.1.75,10.125.1.74,10.125.1.73,10.125.1.72,10.125.1.71,10.125.1.70,10.125.1.69,10.125.1.68,10.125.1.67,10.125.1.66,10.125.1.65,10.125.0.220,10.125.0.200,192.168.3.136,192.168.3.135,192.168.3.134,192.168.3.133,192.168.3.132,192.168.3.131,192.168.3.130,192.168.3.129,192.168.5.200,192.168.5.199,192.168.5.198,192.168.5.197,192.168.5.196,192.168.5.195,192.168.5.194,192.168.5.193,192.168.5.192,192.168.5.191,192.168.5.190,192.168.5. 189,192.168.5.188,1Trusted Client List: 10.125.2.70,10.125.2.69,10.125.2.68,10.125.2.67,10.125.2.66,10.125.2.65,10.125.2.64,10.125.2.63,10.125.2.62,10.125.2.61,10.125.2.60,10.125.2.59,10.125.2.58,10.125.2.57,10.125.2.56,10.125.2.55,10.125.2.54,10.125.2.53,10.125.2.5
> > 
> > And this repeats for about 18KB of garbage, all one line.  I haven't looked
> > yet, but I assume tlist() is still broken.  -d 2 and 3 do the same thing.
> > 
> > 
> > On Thu, Oct 28, 2004 at 09:27:08AM -0600, Dave Jackson alleged:
> > > Garrick,
> > > 
> > >   While most recent p4 snapshots fixed the tmpLine/output overflows,
> > > only the most recent corrects the tlist() issue.  Thanks for reporting
> > > this and please let us know if things work better.
> > > 
> > > Dave



More information about the torqueusers mailing list