decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression`

Question

decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression`

Closed this issue 3 months ago · 0 comments

I have looked for existing issues (including closed) about this

Bug Report

I've seen in some situations the throughput of decompression gets significantly worse when using tower_http::decompression compared to manually implementing a similar logic with async-compression crate.

Version

Platform

Apple silicon macOS

(Not 100% sure, but should happen in Linux as well)

Description

In Deno, we switched the inner implementation of fetch (JavaScript API) from reqwest based to hyper-util based.

denoland/deno#24593

In the hyper-util based implementation, it uses tower_http::decompression to decompress the fetched data if necessary. Note here that reqwest doesn't use tower_http.

After this change, we started to see the throughput to be degraded especially when the server serves compressed large data. Looks at the following graph, showing how long each Deno version takes to 2k requests where it fetches compressed data from the upstream server and then forwards it the end client.

v1.45.2 is before we switched to hyper-based fetch implementation. Since v1.45.3 when we landed it, the throughput got 10x worse.

Then I identified that tower_http::decompression causes this issue, and figured out that if we implement a decompression logic by directly using the async-compression crate, the performance gets back to what it was. (see denoland/deno#25800 for how manual implementation with async-compression affects the performance)

You can find how I performed the benchmark at https://github.com/magurotuna/deno_fetch_decompression_throughput