benchmark result to compare with other crates
davemilter opened this issue · 1 comments
In reddit discussion of new websocket crates, mentioned benchmarks results for websockets crates:
web-socket: 32.26691ms
fastwebsockets: 38.675652ms
tokio-tungstenite: 142.147423ms
code for benchmark here. It is just send/receive. May be somebody would be interesting to improve not the best results for "tungstenite".
It's true that fastwebsockets
implementation from Deno folks is faster (kudos to the authors), but not to the extent that has been stated by the author of the aforementioned web-socket-benchmark
.
I took a quick look at the benchmark and I would not call it a fair comparison - the author compared tokio-tungstenite
and fastwebsockets
to the own library and seems to be biased towards own implementation. After checking the implementation and being aware of the discussion on Reddit post, it does not feel particularly trustworthy.
However, the folks from fastwebsockets
shared some benchmarks recently. I did not check the details, but it looks much closer to what I would expect to see in reality, i.e. fastwebsockets
is clearly faster, but not "3-4 times faster" (not even 2 times faster).
We know that there are some areas to improve the performance of tungstenite
, but unfortunately, none of the original authors/maintainers uses (or actively works with) WebSockets at the moment, so the development of the performance improvements stalled.
The required improvements are known to us and we've discussed them in the past:
I'm confident that after addressing these points the gap in performance between tokio-tungstenite
and fastwebsockets
will be quite negligible.
The essence of the aforementioned problems:
- The API surface. Nowadays,
tokio-tungstenite
(wrapstungstenite
) is the preferred way to usetungstenite
, but thetungstenite
's API surface does not perfectly align with what we would need fortokio-tungstenite
: we need to split certain public functions oftungstenite
and make their semantics more suitable for the async world and we demand the streams to beRead + Write
which is not necessary and severely overcomplicates thetokio-tungstenite
. We also demand that incoming messages own the data that makes us slower on benchmarks since the caller must own the message before passing it totungstenite
/tokio-tungstenite
(so oftentimes you'll see benchmarks doingsend(message.clone())
withtokio-tungstenite
while other implementations would simply dosend(&message)
). Also, we read in smaller chunks, so on large messages and good networks we start getting less efficient. - We also don't support SIMD and vectored writes at the moment.
Ideally, we would need to refactor it into a sans I/O implementation of WebSockets and use it from tokio-tungstenite
.