[torqueusers] Possible oversite in torque_gssapi

Ken Nielson knielson at adaptivecomputing.com
Wed Nov 25 13:44:10 MST 2009


Mike,

Each tcpdisbuffer allocated starts at 262144 bytes. In 2.3.8 and beyond 
these buffers can grow larger as needed for very large jobs. Non-qss 
TORQUE allocates two buffers per socket or a little over 500K of memory. 
With a thousand open sockets that makes 50 meg of memory allocated.

The reason I bring this up is to understand the trade-off between 
keeping resources allocated and returning them when the socket is 
closed.  No conclusions.

As you pointed out it looks as if qss was losing track of its resources. 
That definitely needed to be cleaned up.

Ken

mike coyne wrote:
> I think the difference is that in the gssapi version DIS_tcp_release()
> gets called in pbs_disconect() to clean up the security context and free
> the tcp structure for the connect were in vanilla torque pbs_disconnect
> does not do the clean up function and would just reuse after cleaning up
> the tcp structure for a given filedescriptor. I think what was happening
> it was not cleaning up the other "tp->tdis_thebuf"'s which caused the
> pbs_mom to bloat when they were re-allocated in the next call to
> tcp_setup .
> i wonder if the 
>
> free(tcparray[fd]);
> tcparray[fd] = NULL;
>
> should be removed so the only thing it does is cleanup the security
> context ?
>
> Mike
>
> On Wed, 2009-11-25 at 09:58 -0700, Ken Nielson wrote:
>   
>> Mike Coyne wrote:
>>     
>>> In building the gassapi version ot torque from the svn archive I ran 
>>> across a memory leak in pbs_mom after running valgrind
>>>
>>> It would appear the buffers allocated int the DIS_tcp_setup() didn’t 
>>> get seem to be free’d when the
>>>
>>> DIS_tcp_release() was called so I proposed adding the following…
>>>
>>> This may have resulted from a bad job incorporating the latest update 
>>> in svn to my local build but I though I would ask the group if this is 
>>> the right place to free the buffers?
>>>
>>> Index: /trunk/torque_gss/src/lib/Libifl/tcp_dis.c
>>>
>>> ===================================================================
>>>
>>> --- /trunk/torque_gss/src/lib/Libifl/tcp_dis.c (revision 534)
>>>
>>> +++ /trunk/torque_gss/src/lib/Libifl/tcp_dis.c (revision 538)
>>>
>>> @@ -1075,16 +1075,28 @@
>>>
>>> #ifdef GSSAPI
>>>
>>> OM_uint32 minor;
>>>
>>> #endif
>>>
>>> + struct tcp_chan *tcp;
>>>
>>> + struct tcpdisbuf *tp;
>>>
>>> +
>>>
>>> assert (fd >= 0);
>>>
>>> if (fd >= tcparraymax || tcparray[fd] == NULL)
>>>
>>> return; /* Might be an RPP connection */
>>>
>>> +
>>>
>>> + tcp=tcparray[fd];
>>>
>>> + tp = &tcp->readbuf;
>>>
>>> + if (tp->tdis_thebuf != NULL) free(tp->tdis_thebuf);
>>>
>>> + tp = &tcp->writebuf;
>>>
>>> + if (tp->tdis_thebuf != NULL) free(tp->tdis_thebuf);
>>>
>>> #ifdef GSSAPI
>>>
>>> if (tcparray[fd]->gssctx != GSS_C_NO_CONTEXT)
>>>
>>> gss_delete_sec_context (&minor, &tcparray[fd]->gssctx, GSS_C_NO_BUFFER);
>>>
>>> if (tcparray[fd]->unwrapped.value)
>>>
>>> gss_release_buffer (&minor, &tcparray[fd]->unwrapped);
>>>
>>> + /*fix memory loss in DIS_scp_setup */
>>>
>>> + tp = &tcp->gssrdbuf;
>>>
>>> + if (tp->tdis_thebuf != NULL) free(tp->tdis_thebuf);
>>>
>>> #endif
>>>
>>> free(tcparray[fd]);
>>>
>>> tcparray[fd] = NULL;
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>   
>>>       
>> Mike,
>>
>> Never mind about my last e-mail. You did free the buffer for 
>> tcparray[fd]. So no seg-fault.
>>
>> Ken
>>
>>     
>
>   



More information about the torqueusers mailing list