zytedata/python-zyte-api

Enable compressed HTTP calls by default

BurnzZ opened this issue ยท 7 comments

As @shaneaevans has pointed out, we can shave off some network transfer costs when requesting compressed HTTP responses from the API.

By default, Zyte API returns uncompressed responses and only returns compressed responses when explicitly asked to do so via headers:

curl \
   --user API_KEY: \
   --header 'Content-Type: application/json' \
   --header 'Accept-Encoding: gzip' \   # ๐Ÿ‘ˆ this one
   --output 'out' \
   --data '{"url": "https://example.com/foo/bar", "browserHtml": true}' \
   https://api.zyte.com/v1/extract

Currently, only the User-Agent is set in the headers (code reference).

We'll need to:

  • decide on which Accept-Encoding value to use as the default on the header,
  • provide an easy way for users to easily change this, and
  • have the client automatically decompress the response if compressed.
kmike commented

We should definitely use compression, and make the client send the right Accept-Encoding header. Is it the case client is not sending it now (this is surprising to me)?

Based on aio-libs/aiohttp#5219, I think you are right and we are doing it already.

Although maybe we could make brotli a dependency, to improve performance further.

The aiohttp.request has compress=False as the default (reference) with "deflate" being set.. Our current usage of aiohttp.request does not appear to be using the parameter (reference).

What do you think about passing this as the default? Accept-Encoding: br, gzip (assuming that brotli is installed as a main dependency)

On the other hand, we could also make brotli an optional dependency and would only send Accept-Encoding: br, gzip if installed. Otherwise, Accept-Encoding: gzip.

kmike commented

The aiohttp.request has compress=False as the default (reference) with "deflate" being set.. Our current usage of aiohttp.request does not appear to be using the parameter (reference).

compress=False is for compressing requests, not for decompressing responses; I think that's a separatediscussion topic. It also would need support from the server, I'm not sure if it'd work or not.

What do you think about passing this as the default? Accept-Encoding: br, gzip (assuming that brotli is installed as a main dependency). On the other hand, we could also make brotli an optional dependency and would only send Accept-Encoding: br, gzip if installed. Otherwise, Accept-Encoding: gzip.

If brotli is installed, aiohttp starts to add br to Accept-Encoding automatically, we don't need to do anything here. Documenting that installing brotli is good could help. Though I'm not sure if the server supports brotli or not.

compress=False is for compressing requests, not for decompressing responses; I think that's a separatediscussion topic. It also would need support from the server, I'm not sure if it'd work or not.

When testing it out in curl, the server supports compressed responses in gzip, deflate, and br when asked for via the Accept-Encoding Request Header.

On the other hand, the request body is automatically decompressed by aiohttp by the default auto_decompress=True parameter value in the client (reference).

kmike commented

I'm not sure I follow :) Is there anything for us to implement, or can we close this ticket?

My apologies, I've thought that aiohttp is not compressing it by default. Turns out it's indeed sending using gzip. Confirmed it manually by debugging the code.