[torquedev] pbm_mom segfault in TMomCheckJobChild

Joshua Bernstein jbernstein at penguincomputing.com
Wed Dec 17 16:30:20 MST 2008



Garrick Staples wrote:
> On Wed, Dec 17, 2008 at 02:41:23PM -0800, Joshua Bernstein alleged:
>>
>> Garrick Staples wrote:
>>> On Tue, Dec 16, 2008 at 07:18:24PM -0500, Glen Beane alleged:
>>>> On Tue, Dec 16, 2008 at 7:17 PM, Glen Beane <glen.beane at gmail.com> wrote:
>>>>> On Tue, Dec 16, 2008 at 3:06 PM, Joshua Bernstein
>>>>> <jbernstein at penguincomputing.com> wrote:
>>>>>
>>>>>> if (i == -1)
>>>>>>       if (errno == EINTR)
>>>>>>          continue;
>>>>>>
>>>>>> The ordering is important.  Otherwise the compiler sees if (a && b)
>>>>>> and is allowed to look at 'b' first to handle short-circuit evaluation.
>>>>> I would NEVER use such a brain dead compiler.  Compound Boolean
>>>>> expressions are evaluated left to right.
>>>>> if (ptr == NULL && ptr->foo == bar) is never going to access a null
>>>>> pointer because a correct compiler is never going to do the ptr->foo
>>>>> == bar test first.
>>>> i mean if (ptr != NULL && ptr->foo == bar)
>>> According to the C faq (a reference that I deeply trust), these constructs 
>>> are
>>> perfectly legal.  The || and && operators (and the ?: and comma operators)
>>> create sequence points between the operands and gaurantee the order of
>>> evalution.
>>>
>>> http://c-faq.com/expr/seqpointops.html
>>> http://c-faq.com/expr/shortcircuit.html
>>> http://c-faq.com/expr/seqpoints.html
>> Ah well. I stand corrected about the ordering issue. Though the fact 
>> later on that errno is assigned even if the read() call didn't fail 
>> still remains.
> 
> But nothing ever reads RC.  While I agree that it is sloppy to assign a
> possibly bogus value, I don't see an actual bug anywhere.  It's not a pointer
> that gets followed to a bogus memory address to segfault or bus error.  It's
> just an int that is never acted upon.  RC is never read once assigned the
> (possibly) bogus value in errno, right?

Agreed. Is RC have some sort of global scope that is perhaps read 
elsewhere? If so, then I'd imagine I'd see the segfault when the value 
is read, not assigned.

What if through several calls through this function, the region of 
memory that once held a valid value for errno, now contains a null 
pointer, thus the assignment fails, consider:

void main() {
         int *i;
         *i = '\0';
}

This produces a segfault. What further bolsters my theory is that 
several jobs run through this code just fine, so its doesn't happen 
every time we enter the function. But given, this workload, we *always* 
get the segfault within 10 minutes or so.

> I hate to beat the point, but it seems you are looking for 2 real bugs and I'd
> hate for you to stop looking at this point :)

I won't stop looking, and of course I'm convinced I'm right here, but 
I'm open to other suggestions as to what could be going wrong. I didn't 
guess here, GDB told me so. ;-)

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torquedev mailing list