[torquedev] pbs_mom crashing

Oliver Baltzer obaltzer at flagstonere.bm
Wed Jul 22 06:10:44 MDT 2009


Glen Beane wrote:
>
> can you run pbs_mom in valgrind and trigger the crash?  That might
> catch the stack corruption if that is the case
>
I still don't know how to reproduce the problem, but I did get a crash
while running in Valgrind. Though Valgrind does not seem to like this
very much. Not sure if I am using Valgrind correctly:

# valgrind --leak-check=full --show-reachable=yes 
~obaltzer/build/rpm/BUILD/torque-2.3.7/src/resmom/.libs/pbs_mom -D
[...]
===== MD5 2CB7CFFBC89839C8774942FFB4B3F18C
pbs_mom: LOG_DEBUG::init_groups, pre-sigprocmask
pbs_mom: LOG_DEBUG::init_groups, post-initgroups
pbs_mom: LOG_DEBUG::open_std_file, successfully created/opened
stdout/stderr file '/opt/torque/spool/2362412.cyclone.local.OU'
pbs_mom: LOG_DEBUG::open_std_file, successfully created/opened
stdout/stderr file '/opt/torque/spool/2362412.cyclone.local.ER'
saving extra job info stdout=-1 stderr=-1 taskid=2 nodeid=0
saving extra job info stdout=-1 stderr=-1 taskid=2 nodeid=0
==7229== Invalid write of size 4
==7229==    at 0x42CD78: sessions (mom_mach.c:3124)
==7229==    by 0x417F29: gen_gen (mom_server.c:1092)
==7229==    by 0x417FC2: generate_server_status (mom_server.c:1180)
==7229==    by 0x4181A8: mom_server_all_update_stat (mom_server.c:1323)
==7229==    by 0x4170A7: main_loop (mom_main.c:8013)
==7229==    by 0x41751B: main (mom_main.c:8180)
==7229==  Address 0x4B8BC18 is not stack'd, malloc'd or (recently) free'd
==7229==
==7229== Invalid read of size 4
==7229==    at 0x42CC40: sessions (mom_mach.c:3094)
==7229==    by 0x417F29: gen_gen (mom_server.c:1092)
==7229==    by 0x417FC2: generate_server_status (mom_server.c:1180)
==7229==    by 0x4181A8: mom_server_all_update_stat (mom_server.c:1323)
==7229==    by 0x4170A7: main_loop (mom_main.c:8013)
==7229==    by 0x41751B: main (mom_main.c:8180)
==7229==  Address 0x4B8BA24 is 0 bytes after a block of size 300 alloc'd
==7229==    at 0x4905E12: realloc (vg_replace_malloc.c:306)
==7229==    by 0x42CDB0: sessions (mom_mach.c:3108)
==7229==    by 0x417F29: gen_gen (mom_server.c:1092)
==7229==    by 0x417FC2: generate_server_status (mom_server.c:1180)
==7229==    by 0x4181A8: mom_server_all_update_stat (mom_server.c:1323)
==7229==    by 0x4170A7: main_loop (mom_main.c:8013)
==7229==    by 0x41751B: main (mom_main.c:8180)
--7229-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11
(SIGSEGV) - exiting
--7229-- si_code=1;  Faulting address: 0x1F7A04B8DBA0;  sp: 0x402C80E10

valgrind: the 'impossible' happened:
   Killed by fatal signal
==7229==    at 0x7001A62C: vgPlain_arena_malloc (m_mallocfree.c:169)
==7229==    by 0x700332AA: vgPlain_cli_malloc (replacemalloc_core.c:101)
==7229==    by 0x70001EC1: vgMAC_realloc (mac_malloc_wrappers.c:377)
==7229==    by 0x70034F7D: do_client_request (scheduler.c:995)
==7229==    by 0x7003489E: vgPlain_scheduler (scheduler.c:721)
==7229==    by 0x700477CE: thread_wrapper (syswrap-linux.c:87)
==7229==    by 0x700478C5: run_a_thread_NORETURN (syswrap-linux.c:120)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==7229==    at 0x4905E12: realloc (vg_replace_malloc.c:306)
==7229==    by 0x42CDB0: sessions (mom_mach.c:3108)
==7229==    by 0x417F29: gen_gen (mom_server.c:1092)
==7229==    by 0x417FC2: generate_server_status (mom_server.c:1180)
==7229==    by 0x4181A8: mom_server_all_update_stat (mom_server.c:1323)
==7229==    by 0x4170A7: main_loop (mom_main.c:8013)
==7229==    by 0x41751B: main (mom_main.c:8180)

Cheers,
Oliver

**********************************************************************
This communication contains information which is confidential and may also be legally privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), disclosure, copying, distribution, or other use of, or action taken or omitted to be taken in reliance upon, this communication or the information in it is prohibited and maybe unlawful. If you have received this communication in error please notify the sender by return email, delete it from your system and destroy any copies.
**********************************************************************



More information about the torquedev mailing list