[Mauiusers] Possible Memory Corruption in maui

Jason Williams jasonw at Jhu.edu
Tue Nov 8 15:50:27 MST 2011


Dr Stephan Raub,

Maui does have some very odd "memory management" in it that has a 
tendency to cause these types of crashes when run in high volume 
situations without some tweaks and/or concessions.  I've tracked down, 
and I think fixed, one in the latest svn trunk, but 3.3.1 should already 
have that fix in it.

Can/have you tried running maui from the command line with the -d line 
and catching the corrupt memory and back trace that comes out of it?  
Your original email has the strace, but it cuts off some of the 
backtrace.  I might be able to see where in the code it's having 
problems, if I can get the full back trace.


--
Jason Williams
Systems Engineer
Homewood High Performance Cluster
Johns Hopkins University

On 11/8/2011 12:09 PM, Dr. Stephan Raub wrote:
> Dear Mr. van der Vlies
>
> Currently we have 6095 Jobs queued and 93 Jobs running. Amoung these, we
> have some large job arrays (1000 and 4000 items per array).
>
> Best regards.
> --
> ---------------------------------------------------------
> | | Dr. rer. nat. Stephan Raub
> | | Dipl. Chem.
> | | High-Performance-Computing
> | | Zentrum für Informations- und Medientechnologie
> | | Heinrich-Heine-Universität Düsseldorf
> | | Universitätsstr. 1 / Raum 25.41.O2.25-2
> | | 40225 Düsseldorf / Germany
> | |
> | | Tel: +49-211-811-3911
> | | Fax: +49-211-811-2539
> ---------------------------------------------------------
>
> Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse,
> bzw.
> sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail
> irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
> Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
> benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
> Dank.
>
> Important Note: This e-mail may contain trade secrets or privileged,
> undisclosed or otherwise confidential information. If you have received this
> e-mail in error, you are hereby notified that any review, copying or
> distribution of it is strictly prohibited. Please inform us immediately and
> destroy the original transmittal. Thank you for your cooperation.
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Bas van der Vlies [mailto:basv at sara.nl]
>> Gesendet: Dienstag, 8. November 2011 17:10
>> An: Dr. Stephan Raub
>> Betreff: Re: [Mauiusers] Possible Memory Corruption in maui
>>
>> On 08-11-11 16:40, Dr. Stephan Raub wrote:
>>> Dear fellow maui users,
>>>
>>> we are running Maui 3.3.1 with torque 2.3.7 under RHEL5.5
>>> (2.6.8-194.26.1.el1) on a 600-somewhat core cluster.
>>>
>>> We experienced a sudden death of the maui scheduler with no message
>> in the
>>> logs. We could not figure out a reason so we attached an "strace" to
>> the
>>> maui process (as long as it was "still alive") and we got:
>>>
>> Dear Dr. Stephan Raub,
>>
>> just a question: How many jobs are in the queue?
>>
>> regards
>>
>>
>> --
>> ********************************************************************
>> *  Bas van der Vlies                    e-mail: basv at sara.nl       *
>> *  SARA - Academic Computing Services   Amsterdam, The Netherlands *
>> ********************************************************************
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list