Bugzilla – Bug 205
pbs_server memory on GPU clusters.
Last modified: 2012-07-18 12:59:23 MDT
You need to
before you can comment on or make changes to this bug.
Created an attachment (id=111) [details]
missing free of decode_arst return value
This bug-report relates to:
Lukasz stated that the leak was visible after adding a bunch of new GPU powered
machines. Also we have just faced the same problem in our center on new
60-nodes GPU cluster.
The run under valgrind in test environment revealed the following problem:
==16295== 95,004 bytes in 3,906 blocks are definitely lost in loss record 27 of
==16295== at 0x4A0739E: malloc (vg_replace_malloc.c:207)
==16295== by 0x4C18B2C: disrst (disrst.c:133)
==16295== by 0x4158E4: is_gpustat_get (node_manager.c:1504)
==16295== by 0x416A58: is_request (node_manager.c:2612)
==16295== by 0x41E354: do_rpp (pbsd_main.c:416)
==16295== by 0x41E3DA: rpp_request (pbsd_main.c:462)
==16295== by 0x4C3372D: wait_request (net_server.c:508)
==16295== by 0x41F6B8: main_loop (pbsd_main.c:1203)
==16295== by 0x42033C: main (pbsd_main.c:1759)
Attached suggest patch.
Created an attachment (id=112) [details]
Fixes a memory leak when nodes with GPUs send updates to pbs_server
The previous patch missed a couple of cases for memory leaks. This one takes
care of the rest of the leaks.
The patch submitted by Mariusz missed a couple of cases where memory was
leaking. I have attached another patch which fixes all the leaks.
Thanks for locating this problem and pointing us in the right direction.