[torqueusers] Re: mom segfault in new diag code

Dave Jackson jacksond at supercluster.org
Thu Oct 28 19:02:32 MDT 2004


  If you knew the number of bugs surrounding this small piece of
information, you would appreciate this capability.  It looks like things
are now fixed in TORQUE 1.1.0p3 and higher but in early versions there
were initialization issues, buffer overflow issues, and other memory
corruption issues which caused this list of trusted hosts to be
corrupt.  As you state, this information indicates simply which machines
this mom will trust as peers/sisters.  The truncation is probably ok as
history has shown that either this list is corrupt and contains 4 nodes
or less, or it is fully populated.

  We have seen no issues since patch 3 so we may move this output to a
higher diag level.


On Thu, 2004-10-28 at 18:18, Garrick Staples wrote:
> On Fri, Oct 29, 2004 at 09:57:57AM +1000, Chris Samuel alleged:
> > On Fri, 29 Oct 2004 09:09 am, Dave Jackson wrote:
> > 
> > > ? Our fault, we pushed out a patch which contained the bounds checking
> > > but this patch failed to get updated on the web. ?The new code should
> > > perform all required tlist bounds checking.
> > 
> > Thanks David, is that torque-1.1.0p4-snap.1099003850.tar.gz ?
> 1.1.0p4-snap.1099003850 indeed does real bounds checking.  But the one problem
> I see is that valid info is silently ignored once the end of the 1KB buffer is
> reached.
> At the same time, it's unclear to me why that information is important (or even
> what it represents).  It seems to just print the first 1KB set of IPs from my
> cluster.

More information about the torqueusers mailing list