snapview/tungstenite-rs

benchmark result to compare with other crates

davemilter opened this issue · 1 comments

In reddit discussion of new websocket crates, mentioned benchmarks results for websockets crates:

web-socket:  32.26691ms
fastwebsockets:  38.675652ms
tokio-tungstenite: 142.147423ms

code for benchmark here. It is just send/receive. May be somebody would be interesting to improve not the best results for "tungstenite".

It's true that fastwebsockets implementation from Deno folks is faster (kudos to the authors), but not to the extent that has been stated by the author of the aforementioned web-socket-benchmark.

I took a quick look at the benchmark and I would not call it a fair comparison - the author compared tokio-tungstenite and fastwebsockets to the own library and seems to be biased towards own implementation. After checking the implementation and being aware of the discussion on Reddit post, it does not feel particularly trustworthy.

However, the folks from fastwebsockets shared some benchmarks recently. I did not check the details, but it looks much closer to what I would expect to see in reality, i.e. fastwebsockets is clearly faster, but not "3-4 times faster" (not even 2 times faster).

We know that there are some areas to improve the performance of tungstenite, but unfortunately, none of the original authors/maintainers uses (or actively works with) WebSockets at the moment, so the development of the performance improvements stalled.

The required improvements are known to us and we've discussed them in the past:

  1. #36
  2. #96
  3. #209
  4. #342

I'm confident that after addressing these points the gap in performance between tokio-tungstenite and fastwebsockets will be quite negligible.

The essence of the aforementioned problems:

  • The API surface. Nowadays, tokio-tungstenite (wraps tungstenite) is the preferred way to use tungstenite, but the tungstenite's API surface does not perfectly align with what we would need for tokio-tungstenite: we need to split certain public functions of tungstenite and make their semantics more suitable for the async world and we demand the streams to be Read + Write which is not necessary and severely overcomplicates the tokio-tungstenite. We also demand that incoming messages own the data that makes us slower on benchmarks since the caller must own the message before passing it to tungstenite / tokio-tungstenite (so oftentimes you'll see benchmarks doing send(message.clone()) with tokio-tungstenite while other implementations would simply do send(&message)). Also, we read in smaller chunks, so on large messages and good networks we start getting less efficient.
  • We also don't support SIMD and vectored writes at the moment.

Ideally, we would need to refactor it into a sans I/O implementation of WebSockets and use it from tokio-tungstenite.