[torquedev] potential double free in qdel
Glen Beane
glen.beane at gmail.com
Wed Jul 21 21:12:21 MDT 2010
I'm going to open a bugzilla report for this soon, but I'll throw it
out to the dev list first
I noticed a double free in qdel in certain cases a few years ago, and
I just started digging deeper
I do a lot of development and testing on my Mac. I noticed if I
passed a bad job id to qdel I would get something like this printed to
stderr:
malloc: *** error for object 0x100100080: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
after some debugging I discovered this is being caused by a double
free - the memory is allocated by a strdup() call and later released
with free(). The pointer is then passed to free() again at a later
point
If you pass an unknown job ID to qdel then the error message returned
from the server (stored in connection[connect].ch_errtxt) is free'd
twice somewhere in library calls. This error string is also printed
_after_ it has been freed.
there are a few places this string can get freed, but I first started
looking at pbs_disconnect since everywhere else I looked the pointer
is set to NULL after it is freed. Setting it to NULL in
pbs_disconnect gets rid of the double free, but results in a generic
error message being printed by prt_job_err in qdel:
qdel: Server returned error 15001 for job 123.marlin
rather than
qdel: Unknown Job Id 123.marlin
this means that pbs_disconnect is called on the connection before
prt_job_err is called, and before pbs_disconnect is called at the end
of the "if (stat && (pbs_errno != PBSE_UNKJOBID))" block.
commenting out the "if (locate_job(job_id_out, server_out,
rmt_server))" block also removes the double free, so it seems like
the following happens when a bad job id is passed to qdel:
somehow locate_job() calls pbs_disconnect on the connection that was
opened in qdel and connection[connect].ch_errtxt is freed
prt_job_err uses connection[connect].ch_errtxt even though it has been freed
pbs_disconnect is called again and frees connection[connect].ch_errtxt again
locate_job calles pbs_connect / pbs_disconnect again, but that
shouldn't affect the connection opened in qdel since pbs_connect
should return a new connection number.
can anyone else reproduce this double free, and what do you see if
you run qdel in gdb and put a breakpoint on pbs_disconnect?
More information about the torquedev
mailing list