[torquedev] [Bug 81] Timeouts caused by hanging Disconnect requests

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Sep 23 11:23:34 MDT 2010


Ken Nielson <knielson at adaptivecomputing.com> changed:

           What    |Removed                     |Added
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #3 from Ken Nielson <knielson at adaptivecomputing.com> 2010-09-23 11:23:33 MDT ---
(In reply to comment #2)
> I'm pretty confident that it will solve the problem.
> What I'm not that confident is if this is actually valid. The whole point of
> the code block is to actually confirm the disconnect. If we don't need that,
> then we can simply skip the read part completely and just close the socket.

You have a point. It does not look like we really care about what is read. The
block simply looks for read to return 0 or -1 and then it does not act on any
data that may be received. 

But if you look at process_request on the server side you will see that when
PBS_BATCH_Disconnect is received process_request calls close_conn which calls
close() on the socket. The read in pbs_disconnect will receive the FIN from the
close and we now know the server side of the socket is done.
> Plus I'm betting that the bug 76 is actually caused by the same problem as this
> one. On the other side of the connection there is a forked process that still
> holds the open socket (although it shouldn't).

There are no forked processes. It is all handled in process_request.

Ken Nielson

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list