[torquedev] pbm_mom segfault in TMomCheckJobChild

Joshua Bernstein jbernstein at penguincomputing.com
Sun Dec 21 16:02:23 MST 2008


On Dec 21, 2008, at 2:30 PM, Glen Beane wrote:

> On Sat, Dec 20, 2008 at 8:13 AM, Glen Beane <glen.beane at gmail.com>  
> wrote:
>> On Wed, Dec 17, 2008 at 6:15 PM, Joshua Bernstein
>> <jbernstein at penguincomputing.com> wrote:
>>>
>>>
>>> Garrick Staples wrote:
>>>>
>>>> After investigating both patches yesterday, I have to conclude that
>>>> neither is
>>>> of merit.  The close_conn() should never do the right thing, and  
>>>> the usage
>>>> of
>>>> '&&' in this context is perfectly valid.
>>>
>>> Fair enough. But why is close(i) used there, when apparently in  
>>> the 2.4.0,
>>> its been corrected to close_conn()? Further close_conn() is used  
>>> elsewhere
>>> in many other similar functions, thus it seems like a valid fix.
>>
>>
>> I would *not* assume 2.4.0 is correct.

Thats interesting to me. I would expect a later version of the code  
to be more correct then an older version.

>> The problem with changing it to
>> close_conn() in this situation is that close_conn WON'T DO ANYTHING
>> because svr_conn[i].cn_active == Idle when it is called!
>>
>>    if (svr_conn[i].cn_active != Idle)
>>        {
>>        netcounter_incr();
>>
>>        svr_conn[i].cn_func(i);
>>
>>        /* NOTE:  breakout if state changed (probably received
>> shutdown request) */
>>
>>        if ((SState != NULL) && (OrigState != *SState))
>>          break;
>>        }
>>      else  /* XXXXXXX svr_conn[i].cn_active == Idle !!!! */
>>        {
>>        close_conn(i);
>>        }

Are there any other states cn_active could be in other then Idle or  
not Idle? I see what your saying now though...


> Given what I describe above, I do not think this patch should be
> included at this time.  There must be something else going on here.
> Does anyone else have any thoughts?  I think Garrick and I are on the
> same page here,  can anyone show me where I'm wrong with the code
> quoted above?


Hmmm. Well, there still is a problem here somewhere. Even if the  
close_conn() patch isn't correct, it still seems the other patch of  
errno=0 prevents pbs_mom from dying.

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torquedev mailing list