[torqueusers] Serious torque failure problems
garrick at usc.edu
Mon Aug 15 13:10:20 MDT 2005
On Fri, Aug 12, 2005 at 07:59:32AM +1000, David Singleton alleged:
> Ummm, I dont think these are the issues in PBS udp vs tcp. The original
> PBS authors wrote RPP (Reliable Packet Protocol) on top of udp. My
> belief is that they did this to get asynchronous messaging between
> daemons. The RPP layer has acks, retries, etc built-in but daemons do not
> block on rpp requests. Blocking tcp requests will hang for a tcp timeoout
> period if the other end is not responding. RPP also avoids issues
> of limits on large numbers of sockets although that may be less
> of a problem now.
Just as a data point for everyone... I use RPP just fine with 1700+ nodes.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050815/3ea4167d/attachment.bin
More information about the torqueusers