[torquedev] [Bug 85] Potential 4+ hour hang in pbs_server

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Oct 7 01:40:46 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85

--- Comment #6 from Simon Toth <SimonT at mail.muni.cz> 2010-10-07 01:40:46 MDT ---
> > If the server is hanging because the sockets are failing then they will fail
> > for all nodes. Its just like out of memory error. Or could you please explain
> > what part of the code is this referring to exactly?
> 
> The server is hanging because the node it is in the middle of communicating
> with dies, mid-communication. Please read the first post on this ticket for
> more information.

OK, the real issue I'm pointing out here is that we shouldn't limit the amount
of tries but handle return values correctly. What exactly is the return value
of the bind() call in this case?

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list