[Mauiusers] Maui Crash, possibly linked to RESERVATIONDEPTH

Jason Williams jasonw at jhu.edu
Tue Aug 3 15:30:02 MDT 2010


Hey Folks,

Just an FYI, after a bit of research into what is going on, I think I 
figured out what was causing this error.  Generally speaking, there's 
code in the MResAdjustDRes() function that indirectly relies on memory 
that is allocated in that function being scrubbed (or zero'ed) at 
allocation time.  I changed the 4 mallocs in that function to callocs 
which transparently accomplish this.  Upon doing so, it seems to have 
solved the issue.  The valgrind report now looks quite a bit more 
managable and all early indications point to the fact that the crashing 
will be a thing of the past.

 From what I gather, the fact that the memory allocated previously by 
malloc in that function was not scrubbed, caused a buffer overflow due 
to a missing Null character (or zero value) within the structs it 
manipulates.  I just committed my changes to trunk, so if anyone else 
would like to investigate the changes I made, feel free to do so and 
comment back if you like.


-- 
Jason Williams
Systems Administrator
Johns Hopkins University
Physics and Astronomy Department



Jason Williams wrote:
> I am seeing a very odd bug and am curious if anyone else has seen it. 
> I'm running Maui 3.2.6p21 and any time I set RESERVATIONDEPTH higher 
> than 1 and more than 300 or so jobs in the queue, maui will crash on the 
> bottom side of the scheduler iteration.  The higher the RESERVATIONDEPTH 
> the more likely I am to see the crash. The error message from the crash, 
> after figuring out that the message gets thrown away by the something 
> not redirecting stderr to stdout or to a more meaningful place, is:
> 
> *** glibc detected *** /opt/maui/sbin/maui: malloc(): memory corruption: 
> 0x0000000012247800 ***
> 
> Now the memory address is always the same, although I'm sure to the 
> outside observer its exact value is probably pretty meaningless.
> 
> So on a whim, I ran maui through valgrind to try to see if valgrind had 
> any insight to what was going on.  The valgrind report is long(-ish) and 
> not very encouraging and my experience running things through valgrind 
> is novice at the moment.  But it seems most of what valgrind is 
> complaining about (and which would be relevant) would be the function 
> MResAdjustDRes and all the magic that goes on in there.
> 
> I'm basically wondering if anyone else has seen this sort of behavior 
> and if they have a solution/workaround/patch they'd like to share.  I've 
> been looking at the code for about a day now and don't really see what 
> would be causing the valgrind errors let alone the crashing.  If you're 
> interested in the valgrind report, I can send it along.
> 
> Thanks
> 
> 


More information about the mauiusers mailing list