[torqueusers] Re: mom segfault in new diag code

Garrick Staples garrick at usc.edu
Thu Oct 28 13:40:02 MDT 2004


torque-1.1.0p4-snap.1098735063 isn't segfaulting, but it still isn't right...

$ momctl -d 0 -h hpc1201

Host: hpc1201/hpc1201.usc.edu   Server: hpc-master   Version: torque_1.1.0p4
HomeDirectory:          /var/spool/torque/mom_priv
MOM active:             58 seconds
Last Msg From Server:   58 seconds (CLUSTER_ADDRS)
Last Msg To Server:     13 seconds
LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
JobList:                NONE

diagnostics complete

[ucs at hpc-master /root]$ momctl -d 1 -h hpc1201
.125.1.76,10.125.1.75,10.125.1.74,10.125.1.73,10.125.1.72,10.125.1.71,10.125.1.70,10.125.1.69,10.125.1.68,10.125.1.67,10.125.1.66,10.125.1.65,10.125.0.220,10.125.0.200,192.168.3.136,192.168.3.135,192.168.3.134,192.168.3.133,192.168.3.132,192.168.3.131,192.168.3.130,192.168.3.129,192.168.5.200,192.168.5.199,192.168.5.198,192.168.5.197,192.168.5.196,192.168.5.195,192.168.5.194,192.168.5.193,192.168.5.192,192.168.5.191,192.168.5.190,192.168.5. 189,192.168.5.188,1Trusted Client List: 10.125.2.70,10.125.2.69,10.125.2.68,10.125.2.67,10.125.2.66,10.125.2.65,10.125.2.64,10.125.2.63,10.125.2.62,10.125.2.61,10.125.2.60,10.125.2.59,10.125.2.58,10.125.2.57,10.125.2.56,10.125.2.55,10.125.2.54,10.125.2.53,10.125.2.5

And this repeats for about 18KB of garbage, all one line.  I haven't looked
yet, but I assume tlist() is still broken.  -d 2 and 3 do the same thing.


On Thu, Oct 28, 2004 at 09:27:08AM -0600, Dave Jackson alleged:
> Garrick,
> 
>   While most recent p4 snapshots fixed the tmpLine/output overflows,
> only the most recent corrects the tlist() issue.  Thanks for reporting
> this and please let us know if things work better.
> 
> Dave
> 
> On Wed, 2004-10-27 at 19:52, Garrick Staples wrote:
> > Actually, tlist() seems to be overflowing the buffer too.
> > 
> > On Wed, Oct 27, 2004 at 04:33:35PM -0700, Garrick Staples alleged:
> > > 
> > > torque-1.1.0p4-snap.1098121584
> > > 
> > > The new momctl diag code is segfaulting in mom_main.c:rm_request().  It seems
> > > that neither tmpLine or output are large enough.  Specifically the second
> > > strcat in this code is overflowing output:
> > > 
> > >             if (verbositylevel >= 1)
> > >               {
> > >               /* display okclient list */
> > >               
> > >               tmpLine[0] = '\0';
> > >                 
> > >               tlist(okclients,tmpLine,1024);
> > >               
> > >               strcat(output,"Trusted Client List:   ");
> > >               
> > >               strcat(output,tmpLine);
> > >               
> > >               strcat(output,"\n");
> > >               }
> > > 
> > > 
> > > -- 
> > > Garrick Staples, Linux/HPCC Administrator
> > > University of Southern California
> > 
> > 
> 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041028/b83e18f9/attachment.bin


More information about the torqueusers mailing list