[torquedev] [Bug 85] Potential 4+ hour hang in pbs_server

Matthew Britt msbritt at umich.edu
Wed Oct 6 20:51:17 MDT 2010


The server hangs - doesn't log and doesn't respond to any client commands.  In every case I have checked there is a socket open to the down client node in SYN_WAIT .   Server loops trying new ports,  becoming responsive when either restarted or the down client node is restarted and comes back online.  

 - matt




On Oct 6, 2010, at 6:37 PM, bugzilla-daemon at supercluster.org wrote:

> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85
> 
> --- Comment #5 from dbeer at adaptivecomputing.com 2010-10-06 16:37:46 MDT ---
> (In reply to comment #4)
> 
>> 
>> Sorry you lost me. What is hanging? The server, or the node?
>> 
> 
> the server
> 
>> If the server is hanging because the sockets are failing then they will fail
>> for all nodes. Its just like out of memory error. Or could you please explain
>> what part of the code is this referring to exactly?
> 
> The server is hanging because the node it is in the middle of communicating
> with dies, mid-communication. Please read the first post on this ticket for
> more information.
> 
> -- 
> Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
> 
> 



More information about the torquedev mailing list