[Mauiusers] Valgrind error & leak report for latest Maui snapshot (1125509278)

Dave Jackson jacksond at clusterresources.com
Mon Sep 5 18:04:06 MDT 2005


Chris,

  The first 4 items have already been corrected.  The remaining two are
new and will require some investigation.  Can you send us your maui.cfg
and maui checkpoint files?  With that info, we should be able to
reproduce these and get them fixed.

Thanks,
Dave



On Thu, 2005-09-01 at 22:23 +1000, Chris Samuel wrote:
> Hi folks,
> 
> Downloaded the latest Maui snapshot today (1125509278) and gave it a
> whirl on one of the clusters that was seeing previous snapshots die.
> 
> It seemed to stand up OK under reasonable load (submitting 100 trivial
> 'hostname' jobs in a for loop to give it a rapidly changing workload submitted
> very quickly) without dieing, though the next week or so will be the real test
> as their cluster has just gone live.
> 
> Anyway, I started Maui under the latest version of valgrind (3.0.1) and
> ran one of those batches of 100 jobs through it and this is the report showing
> some possible errors and certain leaks.
> 
> Maui was compiled with CFLAGS="-g -O0"
> 
> Good hunting!
> 
> cheers!
> Chris
> 
> ==19144== Memcheck, a memory error detector.
> ==19144== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
> ==19144== Using LibVEX rev 1367, a library for dynamic binary translation.
> ==19144== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
> ==19144== Using valgrind-3.0.1, a dynamic binary instrumentation framework.
> ==19144== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
> ==19144==
> ==19144== My PID = 19144, parent PID = 2347.  Prog and args are:
> ==19144==    maui
> ==19144==    -d
> --19144--
> --19144-- Valgrind library directory: /usr/local/lib/valgrind
> --19144-- Command line
> --19144--    maui
> --19144--    -d
> --19144-- Startup, with flags:
> --19144--    --log-file=../valgrind-log
> --19144--    --leak-check=full
> --19144--    -v
> --19144-- Contents of /proc/version:
> --19144--   Linux version 2.4.21-32.ELsmp (bhcompile at tweety.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-52)) #1 SMP Fri Apr 15 21:17:59 EDT 2005
> --19144-- Reading syms from /usr/local/maui-3.2.6p14-snap.1125509278/sbin/maui (0x8048000)
> --19144-- Reading syms from /lib/ld-2.3.2.so (0x1B8E4000)
> --19144-- Reading syms from /usr/local/lib/valgrind/stage2 (0xB0000000)
> --19144-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
> ==19144==
> --19144-- Reading syms from /usr/local/lib/valgrind/vg_preload_core.so (0x1B8FB000)
> --19144-- Reading syms from /usr/local/lib/valgrind/vgpreload_memcheck.so (0x1B8FD000)
> --19144-- REDIR: 0x1B8F5F70 (index) redirected to 0x1B9002F0 (index)
> --19144-- REDIR: 0x1B8F6110 (strlen) redirected to 0x1B9004F4 (strlen)
> --19144-- Reading syms from /lib/tls/libm-2.3.2.so (0x1B90F000)
> --19144-- Reading syms from /lib/tls/libc-2.3.2.so (0x1B931000)
> --19144-- Reading syms from /lib/libdl-2.3.2.so (0x1BA69000)
> --19144-- REDIR: 0x1B9A9800 (rindex) redirected to 0x1B900228 (rindex)
> --19144-- REDIR: 0x1B9A8E70 (strcpy) redirected to 0x1B90052C (strcpy)
> --19144-- REDIR: 0x1B9A94D0 (strlen) redirected to 0x1B9004D8 (strlen)
> --19144-- REDIR: 0x1B9A96B0 (strncmp) redirected to 0x1B9006E4 (strncmp)
> --19144-- REDIR: 0x1B9A8350 (strcmp) redirected to 0x1B90073C (strcmp)
> --19144-- REDIR: 0x1B9A9770 (strncpy) redirected to 0x1B9005DC (strncpy)
> --19144-- REDIR: 0x1B9A1E40 (malloc) redirected to 0x1B8FE9BA (malloc)
> --19144-- REDIR: 0x1B9AAD00 (memset) redirected to 0x1B900B58 (memset)
> --19144-- REDIR: 0x1B9A2500 (calloc) redirected to 0x1B8FFC8B (calloc)
> --19144-- REDIR: 0x1B9AB220 (memcpy) redirected to 0x1B9007C0 (memcpy)
> --19144-- REDIR: 0x1B9A1FC0 (free) redirected to 0x1B8FF4CF (free)
> --19144-- REDIR: 0x1B9A81E0 (index) redirected to 0x1B9002D0 (index)
> --19144-- REDIR: 0x1B9ABCD0 (rawmemchr) redirected to 0x1B900BE8 (rawmemchr)
> --19144-- REDIR: 0x1B9AAAE0 (memchr) redirected to 0x1B90079C (memchr)
> --19144-- REDIR: 0x1B9ABDA0 (strchrnul) redirected to 0x1B900BCC (strchrnul)
> --19144-- REDIR: 0x1B9AAEC0 (stpcpy) redirected to 0x1B900948 (stpcpy)
> --19144-- Reading syms from /lib/libnss_files-2.3.2.so (0x1BB7A000)
> ==19144== Conditional jump or move depends on uninitialised value(s)
> ==19144==    at 0x80B75C1: MUStrDup (MUtil.c:389)
> ==19144==    by 0x804A638: main (Server.c:134)
> ==19144==
> ==19144== Conditional jump or move depends on uninitialised value(s)
> ==19144==    at 0x80B7729: MUFree (MUtil.c:454)
> ==19144==    by 0x80B7601: MUStrDup (MUtil.c:399)
> ==19144==    by 0x804A638: main (Server.c:134)
> --19144-- REDIR: 0x1B9A9580 (strnlen) redirected to 0x1B9004B4 (strnlen)
> --19144-- REDIR: 0x1B9A2080 (realloc) redirected to 0x1B8FFD36 (realloc)
> --19144-- REDIR: 0x1B9AACA0 (memmove) redirected to 0x1B900B7C (memmove)
> --19144-- REDIR: 0x1B9AAC80 (bcmp) redirected to 0x1B900918 (bcmp)
> --19144-- REDIR: 0x1B9A8030 (strcat) redirected to 0x1B900330 (strcat)
> --19144-- discard syms at 0x1BB7A000-0x1BB86000 in /lib/libnss_files-2.3.2.so due to munmap()
> ==19144==
> ==19144== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 22 from 1)
> ==19144==
> ==19144== 2 errors in context 1 of 2:
> ==19144== Conditional jump or move depends on uninitialised value(s)
> ==19144==    at 0x80B7729: MUFree (MUtil.c:454)
> ==19144==    by 0x80B7601: MUStrDup (MUtil.c:399)
> ==19144==    by 0x804A638: main (Server.c:134)
> ==19144==
> ==19144== 2 errors in context 2 of 2:
> ==19144== Conditional jump or move depends on uninitialised value(s)
> ==19144==    at 0x80B75C1: MUStrDup (MUtil.c:389)
> ==19144==    by 0x804A638: main (Server.c:134)
> --19144--
> --19144-- supp:   22 Ugly strchr error in /lib/ld-2.3.2.so
> ==19144==
> ==19144== IN SUMMARY: 4 errors from 2 contexts (suppressed: 22 from 1)
> ==19144==
> ==19144== malloc/free: in use at exit: 1759286 bytes in 461 blocks.
> ==19144== malloc/free: 64708 allocs, 64247 frees, 6364046 bytes allocated.
> ==19144==
> ==19144== searching for pointers to 461 not-freed blocks.
> ==19144== checked 30023880 bytes.
> ==19144==
> ==19144==
> ==19144== 3 bytes in 1 blocks are definitely lost in loss record 1 of 55
> ==19144==    at 0x1B8FEA39: malloc (vg_replace_malloc.c:149)
> ==19144==    by 0x1B9A924F: strdup (in /lib/tls/libc-2.3.2.so)
> ==19144==    by 0x80B761D: MUStrDup (MUtil.c:402)
> ==19144==    by 0x804A638: main (Server.c:134)
> ==19144==
> ==19144==
> ==19144== 748 (16 direct, 732 indirect) bytes in 1 blocks are definitely lost in loss record 20 of 55
> ==19144==    at 0x1B8FEA39: malloc (vg_replace_malloc.c:149)
> ==19144==    by 0x8131E41: alloc_bs (PBSD_status.c:235)
> ==19144==    by 0x8131D68: PBSD_status_get (PBSD_status.c:170)
> ==19144==    by 0x8131CA5: PBSD_status (PBSD_status.c:125)
> ==19144==    by 0x812FE31: pbs_statserver (pbsD_statsrv.c:95)
> ==19144==    by 0x80FDF11: __MPBSSystemQuery (MPBSI.c:1092)
> ==19144==    by 0x80FCEF3: MPBSInitialize (MPBSI.c:415)
> ==19144==    by 0x80BC8FE: __MUTFunc (MUtil.c:4717)
> ==19144==    by 0x80BC89E: MUThread (MUtil.c:4690)
> ==19144==    by 0x80F4291: MRMInitialize (MRM.c:239)
> ==19144==    by 0x8110809: MSysStartServer (MSys.c:2801)
> ==19144==    by 0x804A789: main (Server.c:174)
> ==19144==
> ==19144==
> ==19144== 1657 (1440 direct, 217 indirect) bytes in 40 blocks are definitely lost in loss record 40 of 55
> ==19144==    at 0x1B8FFD11: calloc (vg_replace_malloc.c:279)
> ==19144==    by 0x811D9CC: MXMLCreateE (MXML.c:279)
> ==19144==    by 0x809EEE0: MGroupLoadCP (MGroup.c:196)
> ==19144==    by 0x8110FDE: MCPRestore (MCP.c:428)
> ==19144==    by 0x809F104: MGroupAdd (MGroup.c:277)
> ==19144==    by 0x8071B18: MCredSetDefaults (MCred.c:2326)
> ==19144==    by 0x810CEE3: MSysInitialize (MSys.c:319)
> ==19144==    by 0x804A5C9: main (Server.c:125)
> ==19144==
> ==19144==
> ==19144== 3456 bytes in 8 blocks are definitely lost in loss record 41 of 55
> ==19144==    at 0x1B8FFD11: calloc (vg_replace_malloc.c:279)
> ==19144==    by 0x809EAF2: MUserCreate (MUser.c:580)
> ==19144==    by 0x809E417: MUserAdd (MUser.c:293)
> ==19144==    by 0x8071B06: MCredSetDefaults (MCred.c:2324)
> ==19144==    by 0x810CEE3: MSysInitialize (MSys.c:319)
> ==19144==    by 0x804A5C9: main (Server.c:125)
> ==19144==
> ==19144==
> ==19144== 81936 bytes in 2 blocks are definitely lost in loss record 53 of 55
> ==19144==    at 0x1B8FFD11: calloc (vg_replace_malloc.c:279)
> ==19144==    by 0x808A252: MSRBuildHostList (MSR.c:2608)
> ==19144==    by 0x8089F34: MSRUpdate (MSR.c:2488)
> ==19144==    by 0x807F111: MSchedProcessJobs (MSched.c:6844)
> ==19144==    by 0x804A7EC: main (Server.c:187)
> ==19144==
> ==19144== LEAK SUMMARY:
> ==19144==    definitely lost: 86851 bytes in 52 blocks.
> ==19144==    indirectly lost: 949 bytes in 93 blocks.
> ==19144==      possibly lost: 0 bytes in 0 blocks.
> ==19144==    still reachable: 1671486 bytes in 316 blocks.
> ==19144==         suppressed: 0 bytes in 0 blocks.
> ==19144== Reachable blocks (those to which a pointer was found) are not shown.
> ==19144== To see them, rerun with: --show-reachable=yes
> --19144--  memcheck: sanity checks: 1455 cheap, 59 expensive
> --19144--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
> --19144--  memcheck: auxmaps: 0 searches, 0 comparisons
> --19144--  memcheck: secondaries: 461 issued (29504k, 28M)
> --19144--  memcheck: secondaries: 116 accessible and distinguished (7424k, 7M)
> --19144--     tt/tc: 55112 tt lookups requiring 70997 probes
> --19144--     tt/tc: 55112 fast-cache updates, 5 flushes
> --19144-- translate: new        14998 (357536 -> 5275431; ratio 147:10) [0 scs]
> --19144-- translate: dumped     0 (0 -> ??)
> --19144-- translate: discarded  342 (6216 -> ??)
> --19144-- scheduler: 72788173 jumps (bb entries).
> --19144-- scheduler: 1455/206263 major/minor sched events.
> --19144--    sanity: 1456 cheap, 59 expensive checks.
> --19144--    exectx: 4999 lists, 846 contexts (avg 0 per list)
> --19144--    exectx: 128981 searches, 128435 full compares (995 per 1000)
> --19144--    exectx: 2220 cmp2, 115 cmp4, 0 cmpAll
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list