Bug 49 - tracejob not working : memory error
: tracejob not working : memory error
Status: NEW
Product: TORQUE
clients
: 2.5.x
: PC Linux
: P5 normal
Assigned To: Glen
:
:
:
  Show dependency treegraph
 
Reported: 2010-01-29 03:40 MST by Vincent Liard
Modified: 2010-07-01 14:51 MDT (History)
4 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Vincent Liard 2010-01-29 03:40:00 MST
Hello,

I have installed torque 2.5.0 on a small cluster
- head ubuntu (kernel 2.6.31-17-generic)
- nodes debian (kernel 2.6.32-trunk-686)
and I have successfully submitted some very simple test jobs.

However, when I run tracejob, I get the following error:

$ tracejob 18
*** glibc detected *** tracejob: malloc(): memory corruption: 0x093ed0c8 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xe74ff1]
/lib/tls/i686/cmov/libc.so.6[0xe77be3]
/lib/tls/i686/cmov/libc.so.6(__libc_malloc+0x58)[0xe79898]
/lib/tls/i686/cmov/libc.so.6(popen+0x21)[0xe67651]
tracejob[0x804906f]
tracejob[0x8049ad6]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xe20b56]
tracejob[0x8048be1]
======= Memory map: ========
0014e000-00174000 r-xp 00000000 08:06 596937    
/usr/local/lib/libtorque.so.2.0.0
00174000-00175000 r--p 00025000 08:06 596937    
/usr/local/lib/libtorque.so.2.0.0
00175000-00176000 rw-p 00026000 08:06 596937    
/usr/local/lib/libtorque.so.2.0.0
00176000-00213000 rw-p 00000000 00:00 0 
002e7000-00303000 r-xp 00000000 08:06 331747     /lib/libgcc_s.so.1
00303000-00304000 r--p 0001b000 08:06 331747     /lib/libgcc_s.so.1
00304000-00305000 rw-p 0001c000 08:06 331747     /lib/libgcc_s.so.1
00699000-006b4000 r-xp 00000000 08:06 328207     /lib/ld-2.10.1.so
006b4000-006b5000 r--p 0001a000 08:06 328207     /lib/ld-2.10.1.so
006b5000-006b6000 rw-p 0001b000 08:06 328207     /lib/ld-2.10.1.so
00bbb000-00bbc000 r-xp 00000000 00:00 0          [vdso]
00e0a000-00f48000 r-xp 00000000 08:06 344674    
/lib/tls/i686/cmov/libc-2.10.1.so
00f48000-00f49000 ---p 0013e000 08:06 344674    
/lib/tls/i686/cmov/libc-2.10.1.so
00f49000-00f4b000 r--p 0013e000 08:06 344674    
/lib/tls/i686/cmov/libc-2.10.1.so
00f4b000-00f4c000 rw-p 00140000 08:06 344674    
/lib/tls/i686/cmov/libc-2.10.1.so
00f4c000-00f4f000 rw-p 00000000 00:00 0 
08048000-0804b000 r-xp 00000000 08:06 564595     /usr/local/bin/tracejob
0804b000-0804c000 r--p 00002000 08:06 564595     /usr/local/bin/tracejob
0804c000-0804d000 rw-p 00003000 08:06 564595     /usr/local/bin/tracejob
093ed000-0940e000 rw-p 00000000 00:00 0          [heap]
b7600000-b7621000 rw-p 00000000 00:00 0 
b7621000-b7700000 ---p 00000000 00:00 0 
b7748000-b774a000 rw-p 00000000 00:00 0 
b7761000-b7763000 rw-p 00000000 00:00 0 
bf861000-bf876000 rw-p 00000000 00:00 0          [stack]

Is it me or tracejob ?
Comment 1 Al Taufer 2010-04-27 11:50:53 MDT
(In reply to comment #0)
I have been unable to get the 2.5.0 version of tracejob to fail on Ubuntu
2.6.28-18-generic and Ubuntu 2.6.31-20-generic.  Are you seeing the failure on
the Ubuntu or Debian machine?
Comment 2 Chris Samuel 2010-05-02 10:10:35 MDT
Vincent, does compiling this with -g give any extra debugging info in the
stacktrace ?

Have you tried running the command under valgrind or gdb (again compiled with
-g) ?
Comment 3 Vincent Liard 2010-05-03 01:17:29 MDT
Hi Chris and Al, sorry for answering late, I wanted to check my reported bug
beforehand but I no-longer have torque 2.5.0 ready. I'll try to rebuild it by
the end of the week to let you know.
Comment 4 Garrick Staples 2010-07-01 14:51:19 MDT
Michael, please look at the current trunk before commenting further.