[torquedev] pbs_server [torque-2.1.8] crash in adm64
Rajiv Chittajallu
rajive at ieee.org
Thu Jun 28 11:19:40 MDT 2007
I should mention this, mom_job_sync is enabled. I am not quite sure if its
related.
Rajiv wrote on 06/28/07 at 22:15:45 +0530:
>The pbs_server on one of our amd64 boxes is occasionally crashing. Some of the
>nodes are running 32bit pbs_mom. Everything works fine after the restart.
>
>Did anyone notice similar failures? Here is the backtrace.
>
>Jun 28 13:09:52 node0 pbs_server[16596]: segfault at 0000000000000000 rip
>0000002a9582a513 rsp 0000007fbffff0e8 error 4
>
>(gdb) bt
>#0 0x0000002a9582a513 in strstr () from /lib64/tls/libc.so.6
>#1 0x000000000040b042 in sync_node_jobs (np=0x12bfca0, jobstring_in=Variable
>"jobstring_in" is not available.
>) at node_manager.c:828
>#2 0x000000000040b4d6 in is_stat_get (np=0x12bfca0) at node_manager.c:1169
>#3 0x000000000040c36c in is_request (stream=1348, version=Variable "version"
>is not available.
>) at node_manager.c:1940
>#4 0x0000000000410330 in do_rpp (stream=1348) at pbsd_main.c:317
>#5 0x00000000004103e2 in rpp_request (fd=0) at pbsd_main.c:363
>#6 0x0000002a95689c41 in wait_request (waittime=Variable "waittime" is not
>available.
>) at ../Libnet/net_server.c:320
>#7 0x00000000004113eb in main (argc=Variable "argc" is not available.
>) at pbsd_main.c:1123
>(gdb)
>
>Thanks,
>Rajiv
More information about the torquedev
mailing list