[torquedev] Serious DOS problem on server

Ken Nielson knielson at adaptivecomputing.com
Wed Aug 24 10:30:56 MDT 2011



----- Original Message -----
> From: "\"Mgr. Šimon Tóth\"" <toth at fi.muni.cz>
> To: "Torque Developers mailing list" <torquedev at supercluster.org>
> Sent: Wednesday, August 24, 2011 8:14:42 AM
> Subject: [torquedev] Serious DOS problem on server
> After tracing performance issues I have found a real DOS issue.
> 
> The problem is in DIS_tcp_wflush
> 
> The write can hang. Now the fact that the write is blocking doesn't
> help, but a non-blocking would solve the issue either. A sufficiently
> slow counterpart can keep the server locked in this function for a
> long
> time (hours) without triggering communication timeouts.
> 
> I have a quick and hard fix using alarm, but I'm also looking into
> making reply_send() async. This function is nicely separated from the
> rest of the code, so making it run in a separate thread with a watcher
> task shouldn't be a big issue and shouldn't create any race conditions
> with the rest of the code.
> 
> What do you think?

I think this makes sense. I look forward to your solution.


Ken


More information about the torquedev mailing list