[torquedev] pbm_mom segfault in TMomCheckJobChild
jbernstein at penguincomputing.com
Sun Dec 21 16:02:23 MST 2008
On Dec 21, 2008, at 2:30 PM, Glen Beane wrote:
> On Sat, Dec 20, 2008 at 8:13 AM, Glen Beane <glen.beane at gmail.com>
>> On Wed, Dec 17, 2008 at 6:15 PM, Joshua Bernstein
>> <jbernstein at penguincomputing.com> wrote:
>>> Garrick Staples wrote:
>>>> After investigating both patches yesterday, I have to conclude that
>>>> neither is
>>>> of merit. The close_conn() should never do the right thing, and
>>>> the usage
>>>> '&&' in this context is perfectly valid.
>>> Fair enough. But why is close(i) used there, when apparently in
>>> the 2.4.0,
>>> its been corrected to close_conn()? Further close_conn() is used
>>> in many other similar functions, thus it seems like a valid fix.
>> I would *not* assume 2.4.0 is correct.
Thats interesting to me. I would expect a later version of the code
to be more correct then an older version.
>> The problem with changing it to
>> close_conn() in this situation is that close_conn WON'T DO ANYTHING
>> because svr_conn[i].cn_active == Idle when it is called!
>> if (svr_conn[i].cn_active != Idle)
>> /* NOTE: breakout if state changed (probably received
>> shutdown request) */
>> if ((SState != NULL) && (OrigState != *SState))
>> else /* XXXXXXX svr_conn[i].cn_active == Idle !!!! */
Are there any other states cn_active could be in other then Idle or
not Idle? I see what your saying now though...
> Given what I describe above, I do not think this patch should be
> included at this time. There must be something else going on here.
> Does anyone else have any thoughts? I think Garrick and I are on the
> same page here, can anyone show me where I'm wrong with the code
> quoted above?
Hmmm. Well, there still is a problem here somewhere. Even if the
close_conn() patch isn't correct, it still seems the other patch of
errno=0 prevents pbs_mom from dying.
More information about the torquedev