[torquedev] [Bug 85] Potential 4+ hour hang in pbs_server
msbritt at umich.edu
Wed Oct 6 20:51:17 MDT 2010
The server hangs - doesn't log and doesn't respond to any client commands. In every case I have checked there is a socket open to the down client node in SYN_WAIT . Server loops trying new ports, becoming responsive when either restarted or the down client node is restarted and comes back online.
On Oct 6, 2010, at 6:37 PM, bugzilla-daemon at supercluster.org wrote:
> --- Comment #5 from dbeer at adaptivecomputing.com 2010-10-06 16:37:46 MDT ---
> (In reply to comment #4)
>> Sorry you lost me. What is hanging? The server, or the node?
> the server
>> If the server is hanging because the sockets are failing then they will fail
>> for all nodes. Its just like out of memory error. Or could you please explain
>> what part of the code is this referring to exactly?
> The server is hanging because the node it is in the middle of communicating
> with dies, mid-communication. Please read the first post on this ticket for
> more information.
> Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> torquedev mailing list
> torquedev at supercluster.org
More information about the torquedev