[torqueusers] Re: mom segfault in new diag code

Garrick Staples garrick at usc.edu
Thu Oct 28 14:07:46 MDT 2004


tlist() isn't checking the bounds of Buf correctly.  As it recurses, BufSize is
never recalculated.  

This seems to work correctly (but you might want only one strlen())...


diff -ruN torque-1.1.0p4_orig/src/resmom/mom_server.c torque-1.1.0p4/src/resmom/mom_server.c
--- torque-1.1.0p4_orig/src/resmom/mom_server.c	2004-10-25 13:11:01.000000000 -0700
+++ torque-1.1.0p4/src/resmom/mom_server.c	2004-10-28 13:01:48.000000000 -0700
@@ -189,10 +189,10 @@
 
   if (Buf[0] != '\0')
     {
-    strncat(Buf,",",BufSize);
+    strncat(Buf,",",BufSize-strlen(Buf));
     }
 
-  strncat(Buf,tmpLine,BufSize);
+  strncat(Buf,tmpLine,BufSize-strlen(Buf));
   
   return;
   }  /* END tlist() */


On Thu, Oct 28, 2004 at 12:40:02PM -0700, Garrick Staples alleged:
> torque-1.1.0p4-snap.1098735063 isn't segfaulting, but it still isn't right...
> 
> $ momctl -d 0 -h hpc1201
> 
> Host: hpc1201/hpc1201.usc.edu   Server: hpc-master   Version: torque_1.1.0p4
> HomeDirectory:          /var/spool/torque/mom_priv
> MOM active:             58 seconds
> Last Msg From Server:   58 seconds (CLUSTER_ADDRS)
> Last Msg To Server:     13 seconds
> LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
> JobList:                NONE
> 
> diagnostics complete
> 
> [ucs at hpc-master /root]$ momctl -d 1 -h hpc1201
> .125.1.76,10.125.1.75,10.125.1.74,10.125.1.73,10.125.1.72,10.125.1.71,10.125.1.70,10.125.1.69,10.125.1.68,10.125.1.67,10.125.1.66,10.125.1.65,10.125.0.220,10.125.0.200,192.168.3.136,192.168.3.135,192.168.3.134,192.168.3.133,192.168.3.132,192.168.3.131,192.168.3.130,192.168.3.129,192.168.5.200,192.168.5.199,192.168.5.198,192.168.5.197,192.168.5.196,192.168.5.195,192.168.5.194,192.168.5.193,192.168.5.192,192.168.5.191,192.168.5.190,192.168.5. 189,192.168.5.188,1Trusted Client List: 10.125.2.70,10.125.2.69,10.125.2.68,10.125.2.67,10.125.2.66,10.125.2.65,10.125.2.64,10.125.2.63,10.125.2.62,10.125.2.61,10.125.2.60,10.125.2.59,10.125.2.58,10.125.2.57,10.125.2.56,10.125.2.55,10.125.2.54,10.125.2.53,10.125.2.5
> 
> And this repeats for about 18KB of garbage, all one line.  I haven't looked
> yet, but I assume tlist() is still broken.  -d 2 and 3 do the same thing.
> 
> 
> On Thu, Oct 28, 2004 at 09:27:08AM -0600, Dave Jackson alleged:
> > Garrick,
> > 
> >   While most recent p4 snapshots fixed the tmpLine/output overflows,
> > only the most recent corrects the tlist() issue.  Thanks for reporting
> > this and please let us know if things work better.
> > 
> > Dave

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041028/514349c2/attachment.bin


More information about the torqueusers mailing list