[torqueusers] Re: mom segfault in new diag code

Garrick Staples garrick at usc.edu
Thu Oct 28 19:52:28 MDT 2004


On Thu, Oct 28, 2004 at 07:02:32PM -0600, Dave Jackson alleged:
> Garrick,
> 
>   If you knew the number of bugs surrounding this small piece of
> information, you would appreciate this capability.  It looks like things
> are now fixed in TORQUE 1.1.0p3 and higher but in early versions there
> were initialization issues, buffer overflow issues, and other memory
> corruption issues which caused this list of trusted hosts to be
> corrupt.  As you state, this information indicates simply which machines
> this mom will trust as peers/sisters.  The truncation is probably ok as
> history has shown that either this list is corrupt and contains 4 nodes
> or less, or it is fully populated.
> 
>   We have seen no issues since patch 3 so we may move this output to a
> higher diag level.

Now if I could figure out how to restart pbs_mom without breaking jobs I'd be
the happiest admin on the list :)

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041028/c73f00dc/attachment.bin


More information about the torqueusers mailing list