Bug 156 - FIFO pbs_sched crash in check_nodes
: FIFO pbs_sched crash in check_nodes
Status: NEW
Product: TORQUE
pbs_sched
: 2.4.x
: PC Linux
: P5 normal
Assigned To: John Rosenquist
:
:
:
  Show dependency treegraph
 
Reported: 2011-09-01 13:00 MDT by Ari Pollak
Modified: 2011-09-01 14:11 MDT (History)
1 user (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Ari Pollak 2011-09-01 13:00:51 MDT
pbs_sched coredumped this morning, and got this backtrace:
#0  pbs_rescquery (c=0, resclist=0xffd5fa08, num_resc=1, available=0xffd5fa18, 
    allocated=0xffd5fa14, reserved=0xffd5fa10, down=0xffd5fa0c) at
../Libifl/pbsD_resc.c:215
#1  0x08054ff0 in check_nodes (pbs_sd=0, jinfo=0x8d0ac68, ninfo_arr=0x0) at
check.c:507
#2  0x0805523c in is_ok_to_run_job (pbs_sd=0, sinfo=0x88089f0, qinfo=0x882e430,
jinfo=0x8d0ac68)
    at check.c:185
#3  0x0804ce75 in scheduling_cycle (sd=0) at fifo.c:486
#4  0x0804d119 in schedule (cmd=-2754024, sd=0) at fifo.c:383
#5  0x0804bc24 in main (argc=1, argv=0xffd5fe74) at pbs_sched.c:1220


It appears that pbs_rescquery() is expecting arrays for available, allocated,
reserved, and down, but check_nodes() is sending it ints instead.
I'm using TORQUE 2.4.16.
Comment 1 Ari Pollak 2011-09-01 14:11:53 MDT
Actually, passing int by reference doesn't really matter since num_resc=1.
Upon slightly further investigation, the problem is this:


(gdb) print reply->brp_un
$3 = {brp_jid = '\0' <repeats 1045 times>, brp_select = 0x0, brp_status =
{ll_prior = 0x0, 
    ll_next = 0x0, ll_struct = 0x0}, brp_statc = 0x0, brp_txt = {brp_txtlen =
0, brp_str = 0x0}, 
  brp_locate = '\0' <repeats 1024 times>, brp_rescq = {brq_number = 0,
brq_avail = 0x0, 
    brq_alloc = 0x0, brq_resvd = 0x0, brq_down = 0x0}}
(gdb) print reply->brp_un.brp_rescq.brq_avail
$4 = (int *) 0x0