decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression`

- [x] I have looked for existing issues (including closed) about this

## Bug Report

I've seen in some situations the throughput of decompression gets significantly worse when using `tower_http::decompression` compared to manually implementing a similar logic with `async-compression` crate.

### Version



### Platform



Apple silicon macOS

(Not 100% sure, but should happen in Linux as well)

### Description

In Deno, we switched the inner implementation of [`fetch`](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) (JavaScript API) from [`reqwest`](https://docs.rs/reqwest/latest/reqwest/) based to [`hyper-util`](https://docs.rs/hyper-util/latest/hyper_util/) based.

https://github.com/denoland/deno/pull/24593

In the hyper-util based implementation, it uses `tower_http::decompression` to decompress the fetched data if necessary. Note here that reqwest doesn't use tower_http.

After this change, we started to see the throughput to be degraded especially when the server serves compressed large data. Looks at the following graph, showing how long each Deno version takes to 2k requests where it fetches compressed data from the upstream server and then forwards it the end client.

![performance of proxying compressed data in different versions of Deno](https://github.com/user-attachments/assets/093de1ce-fa03-4fa3-a647-35610cac97b8)

v1.45.2 is _before_ we switched to hyper-based `fetch` implementation. Since v1.45.3 when we landed it, the throughput got 10x worse.

Then I identified that `tower_http::decompression` causes this issue, and figured out that if we implement a decompression logic by directly using the `async-compression` crate, the performance gets back to what it was. (see https://github.com/denoland/deno/pull/25800 for how manual implementation with `async-compression` affects the performance)

You can find how I performed the benchmark at https://github.com/magurotuna/deno_fetch_decompression_throughput

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression` #520

Bug Report

Version

Platform

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

decompression: worse throughput when using tower_http::decompression than manual impl with async-compression #520

Description

Bug Report

Version

Platform

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

decompression: worse throughput when using `tower_http::decompression` than manual impl with `async-compression` #520