Worst In Class Performance
SoundsSerious opened this issue · 1 comments
I've been programming in Python Twisted since 2014, and have always felt like it was a great design that worked for a huge number of use cases.
I was shocked to see its performance (specifically Klein) near the bottom of the list in the 'TechEmPower` benchmarks. I'm by no means an expert in Twisted or Klein but this seems like a first-class use case to improve. Whats going on here?
https://www.techempower.com/benchmarks/#section=data-r20&hw=ph&test=plaintext&l=zijzen-sf
In their benchmark code I don't see anything that seems unreasonable:
https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/klein/app.py
@SoundsSerious There isn't a performance suite for Klein; and at least in this configuration, this is probably "correct", in that it's fairly slow out of the box. To the extent that these benchmarks show something useful, it's that they highlight a fairly naive and poorly-performing default for the way Klein is set up.
However, looking more closely at the wild disparity between these numbers, we can see other frameworks that do advertise high performance (particularly Sanic, whose whole reason for existing is performance) doing worse than Klein, which seems odd. So if you look into the benchmark source code, you can see that by far the dominant factor in performance here is the presence or absence of a multi-process configuration that allows the framework to leverage multiple cores, both in terms of its own CPU performance and in terms of parallelism to the database.
In contrast, the Klein benchmark (in the same style as the other very poorly-performing ones) is set up as a single worker making blocking calls to a database; they're not even using an async DB driver. Now, again, the benchmarks do highlight the fact that some frameworks just give you a "workers" tunable, and Klein forces the application developer to invent their own concurrency model using primitives from Twisted, or tools like Ampoule, which is very much not obvious. So there's definitely work to do here but it's probably not "optimizing" Klein until a ton of other stuff to just utilize the hardware that you've got has been put in place. FWIW the places I've used Klein in production, we did put a multiprocess worker pool in front of it, so it's far from impossible, but yeah, a quick microbenchmark like this is probably always going to make us look bad.
I'm closing this because it's not specifically actionable on its own, but we'd love contributions of benchmarks we can regression test against, performance enhancements we could integrate, and some code that would allow users to easily configure a multi-core multi-listener setup, so please don't take this as a rejection of "performance" broadly, I just don't want to leave vague issues lying around forever without a specific plan that defines what 'done' would mean. And performance could always be better, even if we were to somehow make it to the top of this chart we wouldn't stay there automatically.