Performance Improvements
saviorand opened this issue · 8 comments
Parallelization and performance optimizations
Woah, nice! Thanks for testing! Yes, the welcome handler serves an html page with an image, which might be slower. Can I ask how you're profiling this? The charts look sick
Profile was with Linux's built in kernel profiler and "perf" usermode tool, I couldn't find a profiler specifically for mojo yet. This technique does have the advantage of showing all user and kernel mode activity, i.e. the libc and cpython work.
I suspect there is a lot of memory allocation or copying happening in the welcome handler but I'm not all that familiar with mojo and haven't found a technique to profile memory allocation.
i'm also suspicious the use of python sockets might be suboptimal, but what do i know?
flame graph is by https://www.brendangregg.com/perf.html
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
sudo perf record -F99 -g -p `pgrep lightbug` -- sleep 60
sudo perf script | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl out.perf-folded > perf.svg
google-chrome perf.svg
you might also enjoy perf top
Yeah 1500req/s with the base64 image removed.
@crunchy-vonage we're actually doing external_call
s to C in the Mojo server implementation in the sys
folder (this one is enabled by default) and not talking to Python! Python is only invoked in the separate Python implementation in the python
folder
I've made some improvements in #40 , getting 10468 reqs per second now with wrk. wrk
is the tool used, among other things, for TechEmpower benchmarks. I have a fork for potential submission here, but the performance is not satisfying enough yet, and we don't even have JSON serialization support in order to submit it to the listing. Would be cool if we can make an entry at some point though.
Running 1s test @ http://localhost:8080
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.67ms 11.33ms 78.56ms 94.60%
Req/Sec 9.56k 2.40k 11.29k 72.73%
Latency Distribution
50% 53.00us
75% 58.00us
90% 98.00us
99% 66.70ms
10468 requests in 1.10s, 1.59MB read