[torquedev] [Bug 85] Potential 4+ hour hang in pbs_server

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Wed Oct 6 09:17:30 MDT 2010


--- Comment #3 from dbeer at adaptivecomputing.com 2010-10-06 09:17:30 MDT ---
(In reply to comment #2)
> If creation of a socket fails (on all 880 retries) then you can't really use
> the software anyway. Sure you can fall-back after certain amount of retries,
> but does that really help you? You can't create the socket in the first place,
> therefore you will just make the server go to another request and create more
> havoc.

Actually, you can still use the software. You couldn't use it if this were
happening on every node, but if it happens only on one or two nodes out of your
entire cluster, then your pbs_server is hanging endlessly and the rest of your
cluster is going unused. This is why a limit can be useful.

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list