[torqueusers] Three more possible 2.5 (beta) bugs

Stuart Barkley stuartb at 4gh.net
Thu Jul 22 17:33:55 MDT 2010


On Thu, 22 Jul 2010 at 11:03 -0000, Stuart Barkley wrote:

> Problem 1: pbs_server crash:

Debug info below.  Both failures appear the same.  One was a moab
restart with several jobs to remap.  The other was submitting several
jobs quickly in a row.

# pbs_server -D
pbs_server is up
Assertion failed, bad pointer in insert_link
ERROR:  bad new->ll_prior pointer in insert_link
ERROR:  bad new->ll_next pointer in insert_link
0x197fb5f0 0x19800b80 0x197e64d0
Aborted (core dumped)
#

# gdb -c core.20648 /usr/local/sbin/pbs_server
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /usr/local/lib/libtorque.so.2...done.
Loaded symbols for /usr/local/lib/libtorque.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `pbs_server -D'.
Program terminated with signal 6, Aborted.
[New process 20648]
#0  0x0000003200030265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003200030265 in raise () from /lib64/libc.so.6
#1  0x0000003200031d10 in abort () from /lib64/libc.so.6
#2  0x00002b8f16a41826 in insert_link (old=0x197e7d20, new=0x197e64d0, pobj=<value optimized out>, position=<value optimized out>) at ../Libifl/list_link.c:148
#3  0x0000000000431516 in svr_enquejob (pjob=0x197e64a0) at svr_jobfunc.c:376
#4  0x0000000000432f5b in svr_movejob (jobp=0x197e64a0, destination=<value optimized out>, req=0x198161d0) at svr_movejob.c:309
#5  0x0000000000423020 in req_movejob (req=0x198161d0) at req_movejob.c:163
#6  0x0000000000419c66 in process_request (sfds=10) at process_request.c:695
#7  0x00002b8f16a47dee in wait_request (waittime=<value optimized out>, SState=0x71eb18) at ../Libnet/net_server.c:507
#8  0x0000000000417fcc in main_loop () at pbsd_main.c:1186
#9  0x0000000000418955 in main (argc=2, argv=<value optimized out>) at pbsd_main.c:1741
(gdb) quit
#



# pbs_server -D
pbs_server is up
Assertion failed, bad pointer in insert_link
ERROR:  bad new->ll_prior pointer in insert_link
ERROR:  bad new->ll_next pointer in insert_link
0x184d1520 0x1849b320 0x184a67c0
Aborted (core dumped)
#

# gdb -c core.20705 /usr/local/sbin/pbs_server
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /usr/local/lib/libtorque.so.2...done.
Loaded symbols for /usr/local/lib/libtorque.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `pbs_server -D'.
Program terminated with signal 6, Aborted.
[New process 20705]
#0  0x0000003200030265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003200030265 in raise () from /lib64/libc.so.6
#1  0x0000003200031d10 in abort () from /lib64/libc.so.6
#2  0x00002adee7c43826 in insert_link (old=0x184f9170, new=0x184a67c0, pobj=<value optimized out>, position=<value optimized out>) at ../Libifl/list_link.c:148
#3  0x0000000000431516 in svr_enquejob (pjob=0x184a6790) at svr_jobfunc.c:376
#4  0x0000000000432f5b in svr_movejob (jobp=0x184a6790, destination=<value optimized out>, req=0x184f7820) at svr_movejob.c:309
#5  0x0000000000423020 in req_movejob (req=0x184f7820) at req_movejob.c:163
#6  0x0000000000419c66 in process_request (sfds=10) at process_request.c:695
#7  0x00002adee7c49dee in wait_request (waittime=<value optimized out>, SState=0x71eb18) at ../Libnet/net_server.c:507
#8  0x0000000000417fcc in main_loop () at pbsd_main.c:1186
#9  0x0000000000418955 in main (argc=2, argv=<value optimized out>) at pbsd_main.c:1741
(gdb) quit
#


More information about the torqueusers mailing list