[torquedev] pbs_server [torque-2.1.8] crash in adm64

Rajiv Chittajallu rajive at ieee.org
Thu Jun 28 11:19:40 MDT 2007


I should mention this, mom_job_sync is enabled. I am not quite sure if its
related. 

Rajiv wrote on 06/28/07 at 22:15:45 +0530:
>The pbs_server on one of our amd64 boxes is occasionally crashing. Some of the
>nodes are running 32bit pbs_mom. Everything works fine after the restart. 
>
>Did anyone notice similar failures? Here is the backtrace. 
>
>Jun 28 13:09:52 node0  pbs_server[16596]: segfault at 0000000000000000 rip
>0000002a9582a513 rsp 0000007fbffff0e8 error 4
>
>(gdb) bt
>#0  0x0000002a9582a513 in strstr () from /lib64/tls/libc.so.6
>#1  0x000000000040b042 in sync_node_jobs (np=0x12bfca0, jobstring_in=Variable
>"jobstring_in" is not available.
>) at node_manager.c:828
>#2  0x000000000040b4d6 in is_stat_get (np=0x12bfca0) at node_manager.c:1169
>#3  0x000000000040c36c in is_request (stream=1348, version=Variable "version"
>is not available.
>) at node_manager.c:1940
>#4  0x0000000000410330 in do_rpp (stream=1348) at pbsd_main.c:317
>#5  0x00000000004103e2 in rpp_request (fd=0) at pbsd_main.c:363
>#6  0x0000002a95689c41 in wait_request (waittime=Variable "waittime" is not
>available.
>) at ../Libnet/net_server.c:320
>#7  0x00000000004113eb in main (argc=Variable "argc" is not available.
>) at pbsd_main.c:1123
>(gdb) 
>
>Thanks,
>Rajiv


More information about the torquedev mailing list