[Mauiusers] Possible Memory Corruption in maui

Dr. Stephan Raub raub at uni-duesseldorf.de
Wed Nov 9 02:38:23 MST 2011


Dear Jason Williams,

thank you for your hint. Please, find below the result of our Maui running
with the "-d" command line option (maui was running about 5 minutes before
it crashed):

# /usr/local/maui/sbin/maui -d
*** glibc detected *** /usr/local/maui/sbin/maui: malloc(): memory
corruption: 0x00000000099243e0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3300672fae]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x3300674cde]
/usr/local/torque/lib/libtorque.so.2(decode_DIS_replyCmd+0x266)[0x2ab278cb18
e6]
/usr/local/torque/lib/libtorque.so.2(PBSD_rdrpy+0x80)[0x2ab278cb56d0]
/usr/local/torque/lib/libtorque.so.2(PBSD_status_get+0x26)[0x2ab278cb6786]
/usr/local/maui/sbin/maui[0x4d9e59]
/usr/local/maui/sbin/maui[0x48b8e4]
/usr/local/maui/sbin/maui[0x48b84f]
/usr/local/maui/sbin/maui[0x4ce81c]
/usr/local/maui/sbin/maui[0x4ce39e]
/usr/local/maui/sbin/maui[0x4419eb]
/usr/local/maui/sbin/maui[0x403608]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x330061d994]
/usr/local/maui/sbin/maui[0x402cd9]
======= Memory map: ========
00400000-0054f000 r-xp 00000000 08:03 50266128 /usr/local/maui/sbin/maui
0074f000-00754000 rw-p 0014f000 08:03 50266128 /usr/local/maui/sbin/maui
00754000-02344000 rw-p 00754000 00:00 0
0984b000-188f1000 rw-p 0984b000 00:00 0 [heap]
3300200000-330021c000 r-xp 00000000 08:03 18186265 /lib64/ld-2.5.so
330041b000-330041c000 r--p 0001b000 08:03 18186265 /lib64/ld-2.5.so
330041c000-330041d000 rw-p 0001c000 08:03 18186265 /lib64/ld-2.5.so
3300600000-330074e000 r-xp 00000000 08:03 18186304 /lib64/libc-2.5.so
330074e000-330094d000 ---p 0014e000 08:03 18186304 /lib64/libc-2.5.so
330094d000-3300951000 r--p 0014d000 08:03 18186304 /lib64/libc-2.5.so
3300951000-3300952000 rw-p 00151000 08:03 18186304 /lib64/libc-2.5.so
3300952000-3300957000 rw-p 3300952000 00:00 0
3300a00000-3300a02000 r-xp 00000000 08:03 18186457 /lib64/libdl-2.5.so
3300a02000-3300c02000 ---p 00002000 08:03 18186457 /lib64/libdl-2.5.so
3300c02000-3300c03000 r--p 00002000 08:03 18186457 /lib64/libdl-2.5.so
3300c03000-3300c04000 rw-p 00003000 08:03 18186457 /lib64/libdl-2.5.so
3300e00000-3300e82000 r-xp 00000000 08:03 18186543 /lib64/libm-2.5.so
3300e82000-3301081000 ---p 00082000 08:03 18186543 /lib64/libm-2.5.so
3301081000-3301082000 r--p 00081000 08:03 18186543 /lib64/libm-2.5.so
3301082000-3301083000 rw-p 00082000 08:03 18186543 /lib64/libm-2.5.so
3303a00000-3303a0d000 r-xp 00000000 08:03 18186545
/lib64/libgcc_s-4.1.2-20080825.so.1
3303a0d000-3303c0d000 ---p 0000d000 08:03 18186545
/lib64/libgcc_s-4.1.2-20080825.so.1
3303c0d000-3303c0e000 rw-p 0000d000 08:03 18186545
/lib64/libgcc_s-4.1.2-20080825.so.1
3304a00000-3304a15000 r-xp 00000000 08:03 18186491 /lib64/libselinux.so.1
3304a15000-3304c15000 ---p 00015000 08:03 18186491 /lib64/libselinux.so.1
3304c15000-3304c17000 rw-p 00015000 08:03 18186491 /lib64/libselinux.so.1
3304c17000-3304c18000 rw-p 3304c17000 00:00 0
3304e00000-3304e3b000 r-xp 00000000 08:03 18186479 /lib64/libsepol.so.1
3304e3b000-330503b000 ---p 0003b000 08:03 18186479 /lib64/libsepol.so.1
330503b000-330503c000 rw-p 0003b000 08:03 18186479 /lib64/libsepol.so.1
330503c000-3305046000 rw-p 330503c000 00:00 0
3305e00000-3305e02000 r-xp 00000000 08:03 18186469 /lib64/libkeyutils-1.3.so
3305e02000-3306001000 ---p 00002000 08:03 18186469 /lib64/libkeyutils-1.3.so
3306001000-3306002000 rw-p 00001000 08:03 18186469 /lib64/libkeyutils-1.3.so
3306200000-3306211000 r-xp 00000000 08:03 18186474 /lib64/libresolv-2.5.so
3306211000-3306411000 ---p 00011000 08:03 18186474 /lib64/libresolv-2.5.so
3306411000-3306412000 r--p 00Aborted

Thank you for your efforts.

Stephan
--
---------------------------------------------------------
| | Dr. rer. nat. Stephan Raub
| | Dipl. Chem.
| | High-Performance-Computing
| | Zentrum für Informations- und Medientechnologie 
| | Heinrich-Heine-Universität Düsseldorf
| | Universitätsstr. 1 / Raum 25.41.O2.25-2
| | 40225 Düsseldorf / Germany
| |
| | Tel: +49-211-811-3911
| | Fax: +49-211-811-2539
---------------------------------------------------------

Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse,
bzw. 
sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail
irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
Dank.

Important Note: This e-mail may contain trade secrets or privileged,
undisclosed or otherwise confidential information. If you have received this
e-mail in error, you are hereby notified that any review, copying or
distribution of it is strictly prohibited. Please inform us immediately and
destroy the original transmittal. Thank you for your cooperation.

> -----Ursprüngliche Nachricht-----
> Von: mauiusers-bounces at supercluster.org [mailto:mauiusers-
> bounces at supercluster.org] Im Auftrag von Jason Williams
> Gesendet: Dienstag, 8. November 2011 23:50
> An: mauiusers at supercluster.org
> Betreff: Re: [Mauiusers] Possible Memory Corruption in maui
> 
> Dr Stephan Raub,
> 
> Maui does have some very odd "memory management" in it that has a
> tendency to cause these types of crashes when run in high volume
> situations without some tweaks and/or concessions.  I've tracked down,
> and I think fixed, one in the latest svn trunk, but 3.3.1 should
> already have that fix in it.
> 
> Can/have you tried running maui from the command line with the -d line
> and catching the corrupt memory and back trace that comes out of it?
> Your original email has the strace, but it cuts off some of the
> backtrace.  I might be able to see where in the code it's having
> problems, if I can get the full back trace.
> 
> 
> --
> Jason Williams
> Systems Engineer
> Homewood High Performance Cluster
> Johns Hopkins University
> 
> On 11/8/2011 12:09 PM, Dr. Stephan Raub wrote:
> > Dear Mr. van der Vlies
> >
> > Currently we have 6095 Jobs queued and 93 Jobs running. Amoung these,
> > we have some large job arrays (1000 and 4000 items per array).
> >
> > Best regards.
> > --
> > ---------------------------------------------------------
> > | | Dr. rer. nat. Stephan Raub
> > | | Dipl. Chem.
> > | | High-Performance-Computing
> > | | Zentrum für Informations- und Medientechnologie
> > | | Heinrich-Heine-Universität Düsseldorf Universitätsstr. 1 / Raum
> > | | 25.41.O2.25-2
> > | | 40225 Düsseldorf / Germany
> > | |
> > | | Tel: +49-211-811-3911
> > | | Fax: +49-211-811-2539
> > ---------------------------------------------------------
> >
> > Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder
> > Geschäftsgeheimnisse, bzw.
> > sonstige vertrauliche Informationen enthalten. Sollten Sie diese
> > E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des
> > Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail
> ausdrücklich
> > untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die
> > empfangene E-Mail. Vielen Dank.
> >
> > Important Note: This e-mail may contain trade secrets or privileged,
> > undisclosed or otherwise confidential information. If you have
> > received this e-mail in error, you are hereby notified that any
> > review, copying or distribution of it is strictly prohibited. Please
> > inform us immediately and destroy the original transmittal. Thank you
> for your cooperation.
> >
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Bas van der Vlies [mailto:basv at sara.nl]
> >> Gesendet: Dienstag, 8. November 2011 17:10
> >> An: Dr. Stephan Raub
> >> Betreff: Re: [Mauiusers] Possible Memory Corruption in maui
> >>
> >> On 08-11-11 16:40, Dr. Stephan Raub wrote:
> >>> Dear fellow maui users,
> >>>
> >>> we are running Maui 3.3.1 with torque 2.3.7 under RHEL5.5
> >>> (2.6.8-194.26.1.el1) on a 600-somewhat core cluster.
> >>>
> >>> We experienced a sudden death of the maui scheduler with no message
> >> in the
> >>> logs. We could not figure out a reason so we attached an "strace"
> to
> >> the
> >>> maui process (as long as it was "still alive") and we got:
> >>>
> >> Dear Dr. Stephan Raub,
> >>
> >> just a question: How many jobs are in the queue?
> >>
> >> regards
> >>
> >>
> >> --
> >> ********************************************************************
> >> *  Bas van der Vlies                    e-mail: basv at sara.nl       *
> >> *  SARA - Academic Computing Services   Amsterdam, The Netherlands *
> >> ********************************************************************
> >
> >
> > _______________________________________________
> > mauiusers mailing list
> > mauiusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/mauiusers
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers




More information about the mauiusers mailing list