[torquedev] Memory leak in pbs_mom
Steve Snelgrove
ssnelgrove at clusterresources.com
Fri Nov 9 17:30:55 MST 2007
There has been a report of a memory leak in pbs_mom. This becomes
noticeable after running many thousands of jobs.
Running some tests with valgrind point to a problem in catch_child.c,
post_epilogue.
==12438== 317 bytes in 4 blocks are still reachable in loss record 14 of 29
==12438== at 0x4021AA4: calloc (vg_replace_malloc.c:279)
==12438== by 0x807BB7D: attrlist_alloc (attr_func.c:316)
==12438== by 0x807BC21: attrlist_create (attr_func.c:378)
==12438== by 0x807ADDB: encode_size (attr_fn_size.c:201)
==12438== by 0x806308C: encode_used (requests.c:1981)
==12438== by 0x804CE80: post_epilogue (catch_child.c:1040)
==12438== by 0x8078063: scan_for_terminated (mom_start.c:459)
==12438== by 0x805FE2E: main (mom_main.c:5756)
In this routine, post_epilogue, the variable preq is used twice with
alloc_br and does not seem to have corresponding invocations of free_br.
The other routines in this file that are similar, all seem clean up preq
with the following sequence of code.
free_br(preq);
shutdown(sock,SHUT_RDWR);
close_conn(sock);
I am still new to this code and am wondering if someone with more
experience could look at this and see if this is a problem.
Thanks,
Steve
More information about the torquedev
mailing list