[torquedev] [Bug 194] New: pbs_serevr crashes when removing and adding nodes.

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Fri May 11 05:40:45 MDT 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=194

           Summary: pbs_serevr crashes when removing and adding nodes.
           Product: TORQUE
           Version: 3.0.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P5
         Component: pbs_server
        AssignedTo: dbeer at adaptivecomputing.com
        ReportedBy: roy.dragseth at uit.no
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


This is for v3.0.5.

pbs_server will segfault if one quickly delete and create nodes

[root at hpc1 ~]# cat /tmp/removeandaddnodes.txt 
qmgr -c "delete node compute-0-0" 2> /dev/null
qmgr -c "create node compute-0-0 np=2,ntype=cluster" 2> /dev/null
qmgr -c "delete node compute-0-1" 2> /dev/null
qmgr -c "create node compute-0-1 np=2,ntype=cluster" 2> /dev/null
qmgr -c "delete node compute-0-2" 2> /dev/null
qmgr -c "create node compute-0-2 np=2,ntype=cluster" 2> /dev/null
[root at hpc1 ~]# sh -x /tmp/removeandaddnodes.txt
+ qmgr -c 'delete node compute-0-0'
+ qmgr -c 'create node compute-0-0 np=2,ntype=cluster'
+ qmgr -c 'delete node compute-0-1'
+ qmgr -c 'create node compute-0-1 np=2,ntype=cluster'
+ qmgr -c 'delete node compute-0-2'
+ qmgr -c 'create node compute-0-2 np=2,ntype=cluster'


Running pbs_server with gdb gives the following backtrace

[root at hpc1 ~]# gdb /opt/torque/sbin/pbs_server 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/torque/sbin/pbs_server...(no debugging symbols
found)...done.
(gdb) set args -D
(gdb) run
Starting program: /opt/torque/sbin/pbs_server -D
pbs_server is up

Program received signal SIGSEGV, Segmentation fault.
0x0000000000410470 in update_nodes_file ()
Missing separate debuginfos, use: debuginfo-install torque-3.0.5-1.x86_64
(gdb) bt
#0  0x0000000000410470 in update_nodes_file ()
#1  0x0000000000425534 in mgr_node_create ()
#2  0x0000000000427485 in req_manager ()
#3  0x000000000041de9a in process_request ()
#4  0x00002aaaaaacfb39 in wait_request (waittime=<value optimized out>,
SState=0x72f438) at ../Libnet/net_server.c:507
#5  0x000000000041c03b in main_loop ()
#6  0x000000000041cd55 in main ()
(gdb)

This works fine with torque 2.4.11 and 4.0.1.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list