[Mauiusers] Maui Crash, possibly linked to RESERVATIONDEPTH
jasonw at jhu.edu
Tue Aug 3 15:30:02 MDT 2010
Just an FYI, after a bit of research into what is going on, I think I
figured out what was causing this error. Generally speaking, there's
code in the MResAdjustDRes() function that indirectly relies on memory
that is allocated in that function being scrubbed (or zero'ed) at
allocation time. I changed the 4 mallocs in that function to callocs
which transparently accomplish this. Upon doing so, it seems to have
solved the issue. The valgrind report now looks quite a bit more
managable and all early indications point to the fact that the crashing
will be a thing of the past.
From what I gather, the fact that the memory allocated previously by
malloc in that function was not scrubbed, caused a buffer overflow due
to a missing Null character (or zero value) within the structs it
manipulates. I just committed my changes to trunk, so if anyone else
would like to investigate the changes I made, feel free to do so and
comment back if you like.
Johns Hopkins University
Physics and Astronomy Department
Jason Williams wrote:
> I am seeing a very odd bug and am curious if anyone else has seen it.
> I'm running Maui 3.2.6p21 and any time I set RESERVATIONDEPTH higher
> than 1 and more than 300 or so jobs in the queue, maui will crash on the
> bottom side of the scheduler iteration. The higher the RESERVATIONDEPTH
> the more likely I am to see the crash. The error message from the crash,
> after figuring out that the message gets thrown away by the something
> not redirecting stderr to stdout or to a more meaningful place, is:
> *** glibc detected *** /opt/maui/sbin/maui: malloc(): memory corruption:
> 0x0000000012247800 ***
> Now the memory address is always the same, although I'm sure to the
> outside observer its exact value is probably pretty meaningless.
> So on a whim, I ran maui through valgrind to try to see if valgrind had
> any insight to what was going on. The valgrind report is long(-ish) and
> not very encouraging and my experience running things through valgrind
> is novice at the moment. But it seems most of what valgrind is
> complaining about (and which would be relevant) would be the function
> MResAdjustDRes and all the magic that goes on in there.
> I'm basically wondering if anyone else has seen this sort of behavior
> and if they have a solution/workaround/patch they'd like to share. I've
> been looking at the code for about a day now and don't really see what
> would be causing the valgrind errors let alone the crashing. If you're
> interested in the valgrind report, I can send it along.
More information about the mauiusers