[torquedev] Memory leak in pbs_mom

Steve Snelgrove ssnelgrove at clusterresources.com
Fri Nov 9 17:30:55 MST 2007


There has been a report of a memory leak in pbs_mom.  This becomes 
noticeable after running many thousands of jobs.

Running some tests with valgrind point to a problem in catch_child.c, 
post_epilogue.

==12438== 317 bytes in 4 blocks are still reachable in loss record 14 of 29
==12438==    at 0x4021AA4: calloc (vg_replace_malloc.c:279)
==12438==    by 0x807BB7D: attrlist_alloc (attr_func.c:316)
==12438==    by 0x807BC21: attrlist_create (attr_func.c:378)
==12438==    by 0x807ADDB: encode_size (attr_fn_size.c:201)
==12438==    by 0x806308C: encode_used (requests.c:1981)
==12438==    by 0x804CE80: post_epilogue (catch_child.c:1040)
==12438==    by 0x8078063: scan_for_terminated (mom_start.c:459)
==12438==    by 0x805FE2E: main (mom_main.c:5756)



In this routine, post_epilogue, the variable preq is used twice with 
alloc_br and does not seem to have corresponding invocations of free_br.

The other routines in this file that are similar, all seem clean up preq 
with the following sequence of code.

    free_br(preq);

    shutdown(sock,SHUT_RDWR);

    close_conn(sock);

I am still new to this code and am wondering if someone with more 
experience could look at this and see if this is a problem.

Thanks,
Steve




More information about the torquedev mailing list