[torquedev] Serious DOS problem on server

"Mgr. Šimon Tóth" toth at fi.muni.cz
Wed Aug 24 08:14:42 MDT 2011


After tracing performance issues I have found a real DOS issue.

The problem is in DIS_tcp_wflush

The write can hang. Now the fact that the write is blocking doesn't
help, but a non-blocking would solve the issue either. A sufficiently
slow counterpart can keep the server locked in this function for a long
time (hours) without triggering communication timeouts.

I have a quick and hard fix using alarm, but I'm also looking into
making reply_send() async. This function is nicely separated from the
rest of the code, so making it run in a separate thread with a watcher
task shouldn't be a big issue and shouldn't create any race conditions
with the rest of the code.

What do you think?

Btw. I know about the threaded implementations in trunk, and I maintain
that they are overzealous (our server spends 40%-80% of time in tcp write).

-- 
Mgr. Simon Toth


More information about the torquedev mailing list