Add support for kqueue() and epoll() to event loop
giampaolo opened this issue · 11 comments
giampaolo commented
From g.rodola on January 25, 2012 18:35:26
Right now the internal poller depends on asyncore module; as such it can only
use select() and poll() system calls which don't scale/perform well with
thousands of concurrent clients.
This is a benchmark using poll():
pyftpdlib 0.7.0:
2000 concurrent clients (connect, login) 36.63 secs
2000 concurrent clients (RETR 10M file) 128.07 secs
2000 concurrent clients (STOR 10M file) 189.73 secs
2000 concurrent clients (quit) 0.39 secs
proftpd 1.3.4rc2:
2000 concurrent clients (connect, login) 44.59 secs
2000 concurrent clients (RETR 10M file) 33.90 secs
2000 concurrent clients (STOR 10M file) 138.94 secs
2000 concurrent clients (quit) 2.28 secs
2000 clients here actually means 4000 concurrent connections (control + data).
As noticeable, poll() clearly suffers a serious performance degradation.
select() on the other hand, wouldn't have been able to work at all as it has a
limit of 1024 fds.
epoll() (Linux) and kqueue() (BSD / OSX) are supposed to fix this problems
altogheter.
What I have in mind (for 1.0.0 version) is to add a "lib" package within a
modified version of asyncore.dispatcher and an asyncore.loop supporting
kqueue()/epoll().
A partial patch I wrote some time ago is here: http://bugs.python.org/issue6692
Also, tornado ( http://www.tornadoweb.org/ ) can be used as an example for the
epoll() implementation.
Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=203
giampaolo commented
From g.rodola on January 28, 2012 08:23:42
A preliminary patch is in attachment.
=== before patch (poll()) ===
giampaolo@ubuntu:~/svn/pyftpdlib$ python test/bench.py -u giampaolo -p XXX -b
concurrence -s 1K -n 2000
2000 concurrent clients (connect, login) 34.98 secs
2000 concurrent clients (RETR 1K file) 61.02 secs
2000 concurrent clients (STOR 1K file) 169.42 secs
2000 concurrent clients (quit) 0.11 secs
=== after patch (epoll()) ===
giampaolo@ubuntu:~/svn/pyftpdlib$ python test/bench.py -u giampaolo -p XXX -b
concurrence -s 1K -n 2000
2000 concurrent clients (connect, login) 19.46 secs
2000 concurrent clients (RETR 1K file) 24.29 secs
2000 concurrent clients (STOR 1K file) 122.09 secs
2000 concurrent clients (quit) 0.10 secs
Attachment: ioloop.patch
giampaolo commented
From g.rodola on February 17, 2012 11:45:58
Patch in attachment adds kqueue() support (BSD and OSX systems).
Attachment: kqueue.patch
giampaolo commented
giampaolo commented
From g.rodola on February 28, 2012 08:50:49
Updated patch in attachment.
CHANGES:
- got rid of serve_forever()'s "use_poll" and "count" arguments; replaced with
a new "blocking" argument defaulting to True
TODO:
- kqueue() uses an hack for accepting sockets
- epoll()/poll() currently ckecks for error fds in order to detect closed
connections but this might not be necessary (twisted doesn't do that)
- on the other hand, select() on windows might need to do that
Attachment: ioloop.patch
giampaolo commented
From g.rodola on March 02, 2012 14:23:39
Ok, I think this is done.
Here's a summary to clarify what I've done.
Before the patch
================
- The IO loop was based on asyncore stdlib module which only supports select()
and poll().
- These are known to scale/perform reasonably fine under a thousand concurrent
connections, then they start to show performance degration (poll()) or don't
work at all (select()).
- asyncore's IO poller is also particularly naive in that every registered file
descriptor is checked for both read and write operations, even for idle
connections.
- That means that with 200 connected clients we iterate over a list of 400 (200
* 2) elements on every loop.
After the patch
===============
- The IO loop has been rewritten from scratch and now supports epoll() and
kqueue() on Linux and OSX/BSD.
- epoll() and kqueue() scales/perform better with thousands of connections.
- asyncore's original select() and poll() implementation were rewritten.
- The poller is smarter in that it only iterates on fds which are actually
interested in either reading or writing.
- That means that with 200 idle clients except one we will iterate over a list
of 1 element instead of 400.
- This is valid for all pollers, including select().
- By default we use the better poller for the designated platform:
- Linux: epoll()
- OSX/BSD: kqueue()
- all other POSIX: poll()
- Windows: select()
- FTPServer.serve_forever() signature has changed.
Final benchamrk
===============
=== old select() implementation ===
200 concurrent clients (connect, login) 0.96 secs
STOR (1 file with 200 idle clients) 81.94 MB/sec
RETR (1 file with 200 idle clients) 89.01 MB/sec
200 concurrent clients (RETR 10M file) 2.80 secs
200 concurrent clients (STOR 10M file) 6.65 secs
200 concurrent clients (QUIT) 0.02 secs
=== new select() implementation ===
200 concurrent clients (connect, login) 0.78 secs
STOR (1 file with 200 idle clients) 399.46 MB/sec
RETR (1 file with 200 idle clients) 761.53 MB/sec
200 concurrent clients (RETR 10M file) 2.22 secs
200 concurrent clients (STOR 10M file) 5.79 secs
200 concurrent clients (QUIT) 0.01 secs
=== epoll() implementation ===
200 concurrent clients (connect, login) 0.77 secs
STOR (1 file with 200 idle clients) 535.83 MB/sec
RETR (1 file with 200 idle clients) 1632.50 MB/sec
200 concurrent clients (RETR 10M file) 2.24 secs
200 concurrent clients (STOR 10M file) 5.82 secs
200 concurrent clients (QUIT) 0.02 secs
Furter note
===========
A patch which can be applied to current 0.7.0 version version is in attachment.
Attachment: ioloop.patch
giampaolo commented
giampaolo commented
From g.rodola on May 23, 2012 08:18:13
This in now committed in r1049 .
Status: FixedInSVN
Labels: Milestone-1.0.0
giampaolo commented
giampaolo commented
From nagy.att...@gmail.com on July 16, 2012 11:08:42
Thank you very much for this! I've just began to port my SMTP server from
python's default asyncore to your lib and using exactly the same code shows a
substantial amount of speedup.
Previously a dummy SMTP sink could do around 70 MiBps (std asyncore with poll),
with your io loop (FreeBSD, kqueue) it does around 110.
The same logic in twisted can do about 20...