[torquedev] Serious DOS problem on server
"Mgr. Šimon Tóth"
toth at fi.muni.cz
Wed Aug 24 08:14:42 MDT 2011
After tracing performance issues I have found a real DOS issue.
The problem is in DIS_tcp_wflush
The write can hang. Now the fact that the write is blocking doesn't
help, but a non-blocking would solve the issue either. A sufficiently
slow counterpart can keep the server locked in this function for a long
time (hours) without triggering communication timeouts.
I have a quick and hard fix using alarm, but I'm also looking into
making reply_send() async. This function is nicely separated from the
rest of the code, so making it run in a separate thread with a watcher
task shouldn't be a big issue and shouldn't create any race conditions
with the rest of the code.
What do you think?
Btw. I know about the threaded implementations in trunk, and I maintain
that they are overzealous (our server spends 40%-80% of time in tcp write).
Mgr. Simon Toth
More information about the torquedev