[torquedev] pbm_mom segfault in TMomCheckJobChild
Joshua Bernstein
jbernstein at penguincomputing.com
Sun Dec 21 16:02:23 MST 2008
On Dec 21, 2008, at 2:30 PM, Glen Beane wrote:
> On Sat, Dec 20, 2008 at 8:13 AM, Glen Beane <glen.beane at gmail.com>
> wrote:
>> On Wed, Dec 17, 2008 at 6:15 PM, Joshua Bernstein
>> <jbernstein at penguincomputing.com> wrote:
>>>
>>>
>>> Garrick Staples wrote:
>>>>
>>>> After investigating both patches yesterday, I have to conclude that
>>>> neither is
>>>> of merit. The close_conn() should never do the right thing, and
>>>> the usage
>>>> of
>>>> '&&' in this context is perfectly valid.
>>>
>>> Fair enough. But why is close(i) used there, when apparently in
>>> the 2.4.0,
>>> its been corrected to close_conn()? Further close_conn() is used
>>> elsewhere
>>> in many other similar functions, thus it seems like a valid fix.
>>
>>
>> I would *not* assume 2.4.0 is correct.
Thats interesting to me. I would expect a later version of the code
to be more correct then an older version.
>> The problem with changing it to
>> close_conn() in this situation is that close_conn WON'T DO ANYTHING
>> because svr_conn[i].cn_active == Idle when it is called!
>>
>> if (svr_conn[i].cn_active != Idle)
>> {
>> netcounter_incr();
>>
>> svr_conn[i].cn_func(i);
>>
>> /* NOTE: breakout if state changed (probably received
>> shutdown request) */
>>
>> if ((SState != NULL) && (OrigState != *SState))
>> break;
>> }
>> else /* XXXXXXX svr_conn[i].cn_active == Idle !!!! */
>> {
>> close_conn(i);
>> }
Are there any other states cn_active could be in other then Idle or
not Idle? I see what your saying now though...
> Given what I describe above, I do not think this patch should be
> included at this time. There must be something else going on here.
> Does anyone else have any thoughts? I think Garrick and I are on the
> same page here, can anyone show me where I'm wrong with the code
> quoted above?
Hmmm. Well, there still is a problem here somewhere. Even if the
close_conn() patch isn't correct, it still seems the other patch of
errno=0 prevents pbs_mom from dying.
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the torquedev
mailing list