Bugzilla – Bug 156
FIFO pbs_sched crash in check_nodes
Last modified: 2011-09-01 14:11:53 MDT
You need to log in before you can comment on or make changes to this bug.
pbs_sched coredumped this morning, and got this backtrace: #0 pbs_rescquery (c=0, resclist=0xffd5fa08, num_resc=1, available=0xffd5fa18, allocated=0xffd5fa14, reserved=0xffd5fa10, down=0xffd5fa0c) at ../Libifl/pbsD_resc.c:215 #1 0x08054ff0 in check_nodes (pbs_sd=0, jinfo=0x8d0ac68, ninfo_arr=0x0) at check.c:507 #2 0x0805523c in is_ok_to_run_job (pbs_sd=0, sinfo=0x88089f0, qinfo=0x882e430, jinfo=0x8d0ac68) at check.c:185 #3 0x0804ce75 in scheduling_cycle (sd=0) at fifo.c:486 #4 0x0804d119 in schedule (cmd=-2754024, sd=0) at fifo.c:383 #5 0x0804bc24 in main (argc=1, argv=0xffd5fe74) at pbs_sched.c:1220 It appears that pbs_rescquery() is expecting arrays for available, allocated, reserved, and down, but check_nodes() is sending it ints instead. I'm using TORQUE 2.4.16.
Actually, passing int by reference doesn't really matter since num_resc=1. Upon slightly further investigation, the problem is this: (gdb) print reply->brp_un $3 = {brp_jid = '\0' <repeats 1045 times>, brp_select = 0x0, brp_status = {ll_prior = 0x0, ll_next = 0x0, ll_struct = 0x0}, brp_statc = 0x0, brp_txt = {brp_txtlen = 0, brp_str = 0x0}, brp_locate = '\0' <repeats 1024 times>, brp_rescq = {brq_number = 0, brq_avail = 0x0, brq_alloc = 0x0, brq_resvd = 0x0, brq_down = 0x0}} (gdb) print reply->brp_un.brp_rescq.brq_avail $4 = (int *) 0x0