decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression`
Closed this issue · 0 comments
- I have looked for existing issues (including closed) about this
Bug Report
I've seen in some situations the throughput of decompression gets significantly worse when using tower_http::decompression
compared to manually implementing a similar logic with async-compression
crate.
Version
Platform
Apple silicon macOS
(Not 100% sure, but should happen in Linux as well)
Description
In Deno, we switched the inner implementation of fetch
(JavaScript API) from reqwest
based to hyper-util
based.
In the hyper-util based implementation, it uses tower_http::decompression
to decompress the fetched data if necessary. Note here that reqwest doesn't use tower_http.
After this change, we started to see the throughput to be degraded especially when the server serves compressed large data. Looks at the following graph, showing how long each Deno version takes to 2k requests where it fetches compressed data from the upstream server and then forwards it the end client.
v1.45.2 is before we switched to hyper-based fetch
implementation. Since v1.45.3 when we landed it, the throughput got 10x worse.
Then I identified that tower_http::decompression
causes this issue, and figured out that if we implement a decompression logic by directly using the async-compression
crate, the performance gets back to what it was. (see denoland/deno#25800 for how manual implementation with async-compression
affects the performance)
You can find how I performed the benchmark at https://github.com/magurotuna/deno_fetch_decompression_throughput