[torqueusers] segfaulting pbs_moms: torque-2.3.6-2cri.x86_64

Douglas McNab d.mcnab at physics.gla.ac.uk
Thu Nov 12 09:25:50 MST 2009


Yeah, unfortunately I am learning gdb as I go.  I have been spoiled by stack
traces in java in my past it would seem.

(gdb) bt full
#0  mom_server_find_by_ip (search_ipaddr=177078032) at mom_server.c:450
        __v = <value optimized out>
        pms = (mom_server *) 0x6cbb80
        addr = <value optimized out>
#1  0x000000000041965e in mom_server_valid_message_source (stream=0) at
mom_server.c:2022
        addr = (struct sockaddr_in *) 0x14f0ee44
        pms = (mom_server *) 0x0
        id = 0x43be08 "mom_server_valid_message_source"
#2  0x0000000000419870 in is_request (stream=0, version=1,
cmdp=0x7fffcb2774d8) at mom_server.c:2125
        command = <value optimized out>
        ret = 0
        pms = <value optimized out>
        ipaddr = <value optimized out>
        id = "is_request"
#3  0x0000000000416997 in do_rpp (stream=0) at mom_main.c:5351
        tmpI = <value optimized out>
        ret = 0
        proto = 4
        version = 1
        id = "do_rpp"
#4  0x0000000000416a52 in rpp_request (fd=<value optimized out>) at
mom_main.c:5408
        stream = 0
        id = "rpp_request"
#5  0x00002ae6c4678bc8 in wait_request (waittime=<value optimized out>,
SState=0x0) at ../Libnet/net_server.c:469
        i = 1
        n = 0
        now = <value optimized out>
        selset = {__fds_bits = {128, 0 <repeats 15 times>}}
        tmpLine =
"??\220??*\000\000 at q?\024\000\000\000\000????\000\000\000\000f\000\000\000\000\000\000\000\030\000\000\0000\000\000\000\200x'??\177\000\000?w'??\177",
'\0' <repeats 16 times>, "??\220w'??\177\000\000
\000\000\0002032?y'??\177\000\000??\220??*\000\000 at q?\024\000\000\000\000
\000\000\0002080?y'??\177\000\000\220v'??\177\000\000 at q?\024\000\000\000\000\020x'??\177\000\000?x'??\177\000\000
\000\000\0002080
z'??\177\000\000??\220??*\000\000`w'??\177\000\000\000\000\000\000\000\000\000\000\217?e\000\000\000"...
        timeout = {tv_sec = 1, tv_usec = 0}
        OrigState = 0
#6  0x0000000000416c1d in main_loop () at mom_main.c:8046
        myla = 4.9406564584124654e-324
        tmpTime = 0
        id = "main_loop"
#7  0x0000000000416ee1 in main (argc=1, argv=0x7fffcb277bc8) at
mom_main.c:8148
        rc = 0
        tmpFD = <value optimized out>

Cheers,

Dug


2009/11/12 Martin MOKREJŠ <mmokrejs at ribosome.natur.cuni.cz>

> Douglas McNab wrote:
> > Hi Folks,
> >
> > Thanks for all your replies.  I have thought that mixing versions was a
> > little unsafe.  However,  I am a little confused why they can work
> > together for a period of time and then decided to segfault when the
> > server pings the mom's.  So to find an explantion I have built a debug
> > build.  After debugging my segfaulting moms torque-2.3.6-2cri.x86_64
> > further with a debug build I seem to move a little closer to the problem.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > mom_server_find_by_ip (search_ipaddr=177078032) at mom_server.c:450
> > 450           ipaddr = ntohl(addr->sin_addr.s_addr);
> > (gdb) where
> > #0  mom_server_find_by_ip (search_ipaddr=177078032) at mom_server.c:450
> > #1  0x000000000041965e in mom_server_valid_message_source (stream=0) at
> > mom_server.c:2022
> > #2  0x0000000000419870 in is_request (stream=0, version=1,
> > cmdp=0x7fffcb2774d8) at mom_server.c:2125
> > #3  0x0000000000416997 in do_rpp (stream=0) at mom_main.c:5351
> > #4  0x0000000000416a52 in rpp_request (fd=<value optimized out>) at
> > mom_main.c:5408
> > #5  0x00002ae6c4678bc8 in wait_request (waittime=<value optimized out>,
> > SState=0x0) at ../Libnet/net_server.c:469
> > #6  0x0000000000416c1d in main_loop () at mom_main.c:8046
> > #7  0x0000000000416ee1 in main (argc=1, argv=0x7fffcb277bc8) at
> > mom_main.c:8148
> > (gdb) print ipaddr
> > No symbol "ipaddr" in current context.
>
> Try "bt full" command instead in your next gdb session. ;-)
>
> M.
>



-- 
ScotGrid, Room 481, Kelvin Building, University of Glasgow
tel: +44(0)141 330 6439
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20091112/6dd5815e/attachment.html 


More information about the torqueusers mailing list