[torquedev] potential double free in qdel

Glen Beane glen.beane at gmail.com
Wed Jul 21 21:12:21 MDT 2010


I'm going to open a bugzilla report for this soon,  but I'll throw it
out to the dev list first




I noticed a double free in qdel in certain cases a few years ago, and
I just started digging deeper

I do a lot of development and testing on my Mac.  I noticed if I
passed a bad job id to qdel I would get something like this printed to
stderr:

malloc: *** error for object 0x100100080: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug



after some debugging I discovered this is being caused by a double
free - the memory is allocated by a strdup() call and later released
with free().  The pointer is then passed to free() again at a later
point


If you pass an unknown job ID to qdel then the error message returned
from the server (stored in connection[connect].ch_errtxt) is free'd
twice somewhere in library calls.  This error string is also printed
_after_ it has been freed.

there are a few places this string can get freed, but I first started
looking at pbs_disconnect since everywhere else I looked the pointer
is set to NULL after it is freed.  Setting it to NULL in
pbs_disconnect gets rid of the double free, but results in a generic
error message being printed by prt_job_err in qdel:

qdel: Server returned error 15001 for job 123.marlin

rather than

qdel: Unknown Job Id 123.marlin



this means that pbs_disconnect is called on the connection before
prt_job_err is called, and before pbs_disconnect is called at the end
of the "if (stat && (pbs_errno != PBSE_UNKJOBID))" block.


commenting out the "if (locate_job(job_id_out, server_out,
rmt_server))" block also removes the double free,  so it seems like
the following happens when a bad job id is passed to qdel:

somehow locate_job() calls pbs_disconnect on the connection that was
opened in qdel and connection[connect].ch_errtxt is freed
prt_job_err uses connection[connect].ch_errtxt even though it has been freed
pbs_disconnect is called again and frees connection[connect].ch_errtxt again

locate_job calles pbs_connect / pbs_disconnect again, but that
shouldn't affect the connection opened in qdel since pbs_connect
should return a new connection number.


can anyone else reproduce this double free,  and what do you see if
you run qdel in gdb and put a breakpoint on pbs_disconnect?


More information about the torquedev mailing list