psf/cachecontrol

urllib3 2.0: Double reading causes IncompleteRead error

frostming opened this issue · 0 comments

In the Serializer.dumps() method, the underlying fp is replaced with a new, never read stream, which causes the stream to be double read: https://github.com/ionrock/cachecontrol/blob/c05ef9eff1c9ac176481fb99e7a7188aa5b4e17b/cachecontrol/serialize.py#L35-L36

This will lead to an IncompleteRead error on urllib3 2.0 since it starts to strictly check the HTTPResponse.length_remaining attribute which will be a negative value(content_length - 2*content_length = -content_length).

I have no idea why this replacement is necessary.

This can be reproduced with the following minimal reproducible example:

from cachecontrol import CacheControl
from cachecontrol.serialize import Serializer as LegacySerializer
from requests import Session


class Serializer(LegacySerializer):
    def dumps(self, request, response, body=None):
        if not hasattr(response, "strict"):
            # XXX: urllib3 2.0 removes this attribute
            response.strict = False
        return super().dumps(request, response, body)

    def prepare_response(self, request, cached, body_file=None):
        # We don't need to pass strict to HTTPResponse
        cached["response"].pop("strict", None)
        return super().prepare_response(request, cached, body_file)


s = CacheControl(Session(), serializer=Serializer())
url = "https://pypi.org/simple/pdm-backend/"
headers = {"Cache-Control": "max-age=0"}
s.get(url, headers=headers).content
resp = s.get(url, headers=headers)
assert resp.from_cache
resp.content

The error:

Traceback (most recent call last):
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 705, in _error_catcher
    yield
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 830, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(4658 bytes read, -2329 more expected)

...(more tracebacks omitted)

Note that the read size 4658 is twice the content length 2329

Related to #264