[torqueusers] 100+ job lauch failures - 15009 errors.

Andrew J Caird acaird at umich.edu
Wed Nov 7 06:44:42 MST 2007

On Tue, 6 Nov 2007, Garrick Staples wrote:

> On Tue, Nov 06, 2007 at 07:27:39PM -0500, Andrew J Caird alleged:
>> "For large systems (in excess of 300 nodes) it is often valuable to 
>> build TORQUE using TCP for inner-daemon communication rather than the 
>> default of RPP (reliable packet protocol).  This can be accomplished 
>> using the '--disable-rpp' configure option."
>> etc.
>> Is that not true still?
> The statement is false because --disable-rpp doesn't effect inter-mom 
> communication or inter-server communcation (the two forms of 
> inter-daemon communication).  It only effects resource requests.
> Back in the OpenPBS days, it was common for schedulers to do lots of 
> resource requests directly to the MOMs.  One of the earliest TORQUE 
> patches obsoleted that mechanism for schedulers.  Now momctl is the only 
> program that issues resource requests.
> In fact, the TCP request requests have a bug in that sockets are never 
> closed so doing lots and lots of requests tends to run out of sockets.

Can someone with Wiki access update the documentation to this effect?


More information about the torqueusers mailing list