aio-libs/aiohttp

Received "Response payload is not completed" when reading response

Closed this issue ยท 88 comments

๐Ÿž Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

๐Ÿ’ก To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

๐Ÿ’ก Expected behavior
The response is correctly parsed.

๐Ÿ“‹ Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

๐Ÿ“‹ Your version of the Python

$ python --version
Python 3.8.1

๐Ÿ“‹ Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

๐Ÿ“‹ Additional context

I suspect this has to be with this comment:

aiohttp/aiohttp/streams.py

Lines 352 to 356 in 1740c99

# TODO: should be `if` instead of `while`
# because waiter maybe triggered on chunk end,
# without feeding any data
while not self._buffer and not self._eof:
await self._wait('read')

as this is where is failing and the situation seems that is the one described in the TODO.

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

I'm curious if it's somehow related to #4549.

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581
Python: 3.8.4
aiohttp: 3.6.2

I think there do have some relaionship.
And I think the core conditions are frequent visits to a single domain name.

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

Change the while to if seems no help.

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

  • Calling await gunicorn_response.content.wait_eof() prior to calling iter_chunked reduces the instance of the error, but it still happens occasionally.
  • Setting the Connection: keep-alive header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me.
  • Changing the while to an if in the source code in conjunction with calling wait_eof() prior to iterating over the body resolves the issue.

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12
aiohttp: 3.8.1
yarl: 1.7.2
multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

a-gn commented

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

I have the same issue with Python 3.10

Same error here!

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

Every time I download the last few files, this error message appears

Hi, is there any fix for this? Appreciate it.

My services occasionally crash with this error, Python 3.10.12 and Name: aiohttp Version: 3.8.5.

What are our obstacles in isolating this issue, before we can fix it?

I'll ensure I hold on to the logs the next time it arises, do we know what's causing it?

Thanks!

What are our obstacles in isolating this issue, before we can fix it?

Not sure. Maybe we can figure out a message that causes the problem, or maybe it needs multiple steps to trigger..
It looks like it might be related to #5238, so maybe we can get that PR updated and merged.

I'll try that PR out, will report if something breaks

@Dreamsorcerer @TheOnlyWayUp Did you guys seem to figure this out? May be a session setting issue wrt stale connections. I am running into this issue with langchainjs which uses openAI's python client.

All discussion is above, so feel free to follow up. PR #5238 could be worked on to get it merged, or if you can figure out a test (that runs in our CI) to reproduce the issue, then that would be a great help.

I started to have this issue with 3.8.6 on pypy 3.9.18 consistently on CI, 80% of the time on my local box for a very specific streaming unittest (so I can trigger it quite easily with a simple pytest run). It was working fine on 3.8.5.

Funny thing is that reading your ChangeLog, I saw the change for lenient parsing in llhttp and the -X dev mode that does the parsing differently, so out of ideas, I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back. Strangely, this works great on any other Python version.

Hope this helps somehow.

Stacktrace is consistent with everyone in this thread

../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/client_reqrep.py:1043: in read
    self._body = await self.content.read()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:375: in read
    block = await self.readany()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:397: in readany
    await self._wait("readany")
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:304: in _wait
    await waiter
/usr/lib/pypy3.9/asyncio/futures.py:284: in __await__
    yield self  # This tells Task to wait for completion.
/usr/lib/pypy3.9/asyncio/tasks.py:328: in __wakeup
    future.result()

I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back.

Hmm, that's surprising. I wonder if strict parsing is causing the connection to be get closed without raising an error somehow, thus forcing it to open a new connection...

I can confirm this on python 3.11/aiohttp 3.8.6

If I downgrade to 3.8.5 I don't have a problem if I do:

if (method == "GET):
   await session.get(....)
if (method == "POST")
  await session.post(...)

strangely, if I just do just session.request(method...), I get the error on 3.8.5 as well.

The possible fix (depending on what really is the root cause) is in 3.9.0b1, so please try to reproduce with that.

Nope, still failing.

Then someone will need to do some more debugging and/or figure out a reliable test that we can use to reproduce the behaviour.

Another data point: this is happening to our Octomachinery-based bots (Chronographer and Patchback), which are built on top of gidgethub+aiohttp, just like in the Pablo's traceback.
It's happening under CPython 3.9.13 on the quay.io/fedora/python-39:latest image with aiohttp 3.6.2. The bots are running Sentry, so it's possible to look a bit deeper into the tracebacks: sanitizers/octomachinery#59.
Upon inspecting them, I noticed that they are related to GET and PATCH requests to the GitHub APIs. Nothing really useful.
Judging from the other hints in this issue, this might be reproducible in slow envs, like with high network latency or PyPy (which is often slower closer to its startup, before the JIT warms up).
Back to gidgethub, it uses aiohttp.ClientSession.request() โ€” https://github.com/gidgethub/gidgethub/blob/7474cae/gidgethub/aiohttp.py#L19, which is in line with one of the comments above suggesting that calling dedicated methods like .get(), .patch() instead of .request() might yield a differing behavior.

Anyway, having Sentry in place, we could probably make some educated guesses and instrument my bots to log more things. If anybody has ideas, I'm all ears...

Comparing .get(), .patch() and .request(), I concluded that they are virtually equivalent and haven't changed in through versions. So your @rlippmann observation must be caused by something else. Also note that all of these methods return async context managers and should be used through async with and not await, for proper cleanup of the underlying response object.

So the exception is set here:

ClientPayloadError("Response payload is not completed")
.

I wonder if one of these tests is a reproducer (when the exception processing is unsuppressed):

  • async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> None:
    file_path = tmp_path / "huge_data.unknown_mime_type"
    # fill 100MB file
    with file_path.open("wb") as f:
    for i in range(1024 * 20):
    f.write((chr(i % 64 + 0x20) * 1024).encode())
    task = None
    async def handler(request):
    nonlocal task
    task = request.task
    # reduce send buffer size
    tr = request.transport
    sock = tr.get_extra_info("socket")
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024)
    ret = web.FileResponse(file_path)
    return ret
    app = web.Application()
    app.router.add_get("/", handler)
    client = await aiohttp_client(app)
    resp = await client.get("/")
    assert resp.status == 200
    task.cancel()
    await asyncio.sleep(0)
    data = b""
    while True:
    try:
    data += await resp.content.read(1024)
    except aiohttp.ClientPayloadError:
    break
    assert len(data) < 1024 * 1024 * 20
    await resp.release()
    await client.close()
  • async def test_bad_payload_compression(aiohttp_client: Any) -> None:
    async def handler(request):
    resp = web.Response(text="text")
    resp.headers["Content-Encoding"] = "gzip"
    return resp
    app = web.Application()
    app.router.add_get("/", handler)
    client = await aiohttp_client(app)
    resp = await client.get("/")
    assert 200 == resp.status
    with pytest.raises(aiohttp.ClientPayloadError):
    await resp.read()
    resp.close()
  • async def test_raw_server_cancelled_in_write_eof(
    aiohttp_raw_server: Any, aiohttp_client: Any
    ):
    class MyResponse(web.Response):
    async def write_eof(self, data=b""):
    raise asyncio.CancelledError("error")
    async def handler(request):
    resp = MyResponse(text=str(request.rel_url))
    return resp
    loop = asyncio.get_event_loop()
    loop.set_debug(True)
    logger = mock.Mock()
    server = await aiohttp_raw_server(handler, logger=logger)
    cli = await aiohttp_client(server)
    resp = await cli.get("/path/to")
    with pytest.raises(client.ClientPayloadError):
    await resp.read()
    logger.debug.assert_called_with("Ignored premature client disconnection")

So the exception is set here:

ClientPayloadError("Response payload is not completed")

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

Yes, I was also thinking that.

async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> None:
file_path = tmp_path / "huge_data.unknown_mime_type"
# fill 100MB file
with file_path.open("wb") as f:
for i in range(1024 * 20):
f.write((chr(i % 64 + 0x20) * 1024).encode())
task = None
async def handler(request):
nonlocal task
task = request.task
# reduce send buffer size
tr = request.transport
sock = tr.get_extra_info("socket")
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024)
ret = web.FileResponse(file_path)
return ret
app = web.Application()
app.router.add_get("/", handler)
client = await aiohttp_client(app)
resp = await client.get("/")
assert resp.status == 200
task.cancel()
await asyncio.sleep(0)
data = b""
while True:
try:
data += await resp.content.read(1024)
except aiohttp.ClientPayloadError:
break
assert len(data) < 1024 * 1024 * 20
await resp.release()
await client.close()

When I injected a breakpoint here under the except block, and modified client_proto.py with the causing exception details, I extracted this when I ran the specific test with pytest -s:

(Pdb) pp traceback.print_exc()
Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/aiohttp/client_proto.py", line 95, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
    raise ContentLengthError(
aiohttp.http_exceptions.ContentLengthError: 400, message:
  Not enough data for satisfy content length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/tests/test_web_sendfile_functional.py", line 985, in test_static_file_huge_cancel
    data += await resp.content.read(1024)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/src/github/aio-libs/aiohttp/aiohttp/streams.py", line 347, in read
    raise self._exception
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>
None

Hmm, should that exception be ignored, or should we not even be trying to feed EOF?

As far as I can see, that means the connection has been dropped before we've read the full response, and then we try to pretend that we received EOF in order to finish it off?

But, in the context of that test, this is clearly correct, expected behaviour currently:

Reading from the stream may raise aiohttp.ClientPayloadError if the response object is closed before response receives all data or in case if any transfer encoding related errors like malformed chunked encoding of broken compression data.
https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientResponse.content

No idea.. But it think, that test may be a basis for a new one.
FTR this is what I did:

diff --git a/aiohttp/client_proto.py b/aiohttp/client_proto.py
index c42c9540d..cc5916a37 100644
--- a/aiohttp/client_proto.py
+++ b/aiohttp/client_proto.py
@@ -93,11 +93,13 @@ class ResponseHandler(BaseProtocol, DataQueue[Tuple[RawResponseMessage, StreamRe
         if self._parser is not None:
             try:
                 uncompleted = self._parser.feed_eof()
-            except Exception:
+            except Exception as underlying_exc:
                 if self._payload is not None:
-                    self._payload.set_exception(
-                        ClientPayloadError("Response payload is not completed")
+                    reraised_exc = ClientPayloadError(
+                        f"Response payload is not completed: {underlying_exc !r}",
                     )
+                    reraised_exc.__cause__ = underlying_exc
+                    self._payload.set_exception(reraised_exc)
 
         if not self.is_eof():
             if isinstance(exc, OSError):
diff --git a/tests/test_web_sendfile_functional.py b/tests/test_web_sendfile_functional.py
index 6a64f8aa2..ec0c04950 100644
--- a/tests/test_web_sendfile_functional.py
+++ b/tests/test_web_sendfile_functional.py
@@ -966,6 +966,7 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
         sock = tr.get_extra_info("socket")
         sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024)
         ret = web.FileResponse(file_path)
+        # task.cancel()
         return ret
 
     app = web.Application()
@@ -974,15 +975,21 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
     client = await aiohttp_client(app)
 
     resp = await client.get("/")
-    assert resp.status == 200
+    # assert resp.status == 200
     task.cancel()
-    await asyncio.sleep(0)
+    # await asyncio.sleep(0)
+    await asyncio.sleep(10)
     data = b""
     while True:
         try:
             data += await resp.content.read(1024)
-        except aiohttp.ClientPayloadError:
+        except aiohttp.ClientPayloadError as payload_err:
+            import sys, traceback
+            traceback.print_exc()
+            traceback.print_exception(payload_err.__cause__)
+            breakpoint()
             break
+    assert resp.status == 200
     assert len(data) < 1024 * 1024 * 20
 
     await resp.release()
+        # task.cancel()

This still produces an error when the task isn't cancelled? That doesn't seem right.

But, the changes to logging should probably be put into a PR regardless.

+        # task.cancel()

This still produces an error when the task isn't cancelled? That doesn't seem right.

No, that's a leftover. There's still the original cancel outside the handler.

But, the changes to logging should probably be put into a PR regardless.

Totally. I haven't gotten to that. Just experimenting with stuff at the moment. Feel free to copy my patch into a PR.

But, the changes to logging should probably be put into a PR regardless.

A more complete PR should probably update aiohttp.helpers.set_exception() with a new cause argument (having some sentinel default, because None has a special meaning). That helper should (maybe?) check if the exception doesn't already have a cause set (not sure).

And all set_exception() invocations would need to be replaced with the helper. I don't see any calls actually setting causes across the entire project:

$ rg -B2 \\bset_exception\\b 
tests/test_connector.py
1646-        conn._create_connection = mock.Mock()
1647-        conn._create_connection.return_value = loop.create_future()
1648:        conn._create_connection.return_value.set_exception(err)

tests/test_streams.py
267-        assert b"li" == data
268-
269:        stream.set_exception(ValueError())
--
376-        assert b"line\n" == data
377-
378:        stream.set_exception(ValueError())
--
498-        assert b"line" + separator == data
499-
500:        stream.set_exception(ValueError("Another exception"))
--
572-        assert b"li" == data
573-
574:        stream.set_exception(ValueError())
--
644-
645-        exc = ValueError()
646:        stream.set_exception(exc)
--
652-
653-        async def set_err():
654:            stream.set_exception(ValueError())
--
672-        t.cancel()
673-        await asyncio.sleep(0)
674:        # The following line fails if set_exception() isn't careful.
675:        stream.set_exception(RuntimeError("message"))
--
706-        assert b"line\n" == data
707-
708:        stream.set_exception(ValueError())
--
734-        stream = self._make_one()
735-        stream.feed_data(b"line\n")
736:        stream.set_exception(ValueError())
--
1070-        stream = self._make_one()
1071-        exc = RuntimeError()
1072:        stream.set_exception(exc)
--
1101-    s = streams.EmptyStreamReader()
1102-    assert str(s) is not None
1103:    assert s.set_exception(ValueError()) is None
--
1206-        item = object()
1207-        buffer.feed_data(item)
1208:        buffer.set_exception(ValueError)
--
1215-
1216-    async def test_read_exception(self, buffer: Any) -> None:
1217:        buffer.set_exception(ValueError())
--
1223-        val = object()
1224-        buffer.feed_data(val, 1)
1225:        buffer.set_exception(ValueError())
--
1236-
1237-        buffer.feed_eof()
1238:        buffer.set_exception(ValueError())
--
1245-
1246-        exc = ValueError()
1247:        buffer.set_exception(exc)
--
1252-
1253-        async def set_err():
1254:            buffer.set_exception(ValueError())
--
1399-
1400-    exc = ValueError()
1401:    reader.set_exception(exc)
--
1417-
1418-    exc = ValueError()
1419:    reader.set_exception(exc)
--
1432-    reader.on_eof(on_eof)
1433-
1434:    reader.set_exception(ValueError())

tests/test_client_request.py
1017-    async def throw_exc():
1018-        await asyncio.sleep(0.01)
1019:        fut.set_exception(ValueError)
--
1024-    await req._writer
1025-    # assert conn.close.called
1026:    assert conn.protocol.set_exception.called
--
1042-    async def throw_exc():
1043-        await asyncio.sleep(0.01)
1044:        fut.set_exception(inner_exc)
--
1049-    await req._writer
1050-    # assert connection.close.called
1051:    assert conn.protocol.set_exception.called
1052:    outer_exc = conn.protocol.set_exception.call_args[0][0]
--
1149-    await req.write_bytes(writer, conn)
1150-
1151:    assert conn.protocol.set_exception.called
1152:    exc = conn.protocol.set_exception.call_args[0][0]

tests/test_web_functional.py
1847-        async with resp:
1848-            assert resp.status == 200
1849:            resp.content.set_exception(RuntimeError())

tests/test_client_functional.py
954-        url, headers = server.make_url("/"), {"Connection": "Keep-alive"}
955-        resp = await session.get(url, headers=headers)
956:        resp.content.set_exception(ValueError())
--
2694-        with pytest.raises(RuntimeError):
2695-            async with session.get(server.make_url("/")) as resp:
2696:                resp.content.set_exception(RuntimeError())
--
2731-    with pytest.raises(RuntimeError):
2732-        async with cm as resp:
2733:            resp.content.set_exception(RuntimeError())

tests/test_web_websocket.py
313-    exc = EofStream()
314-    res = loop.create_future()
315:    res.set_exception(exc)
--
331-    ws._reader = mock.Mock()
332-    res = loop.create_future()
333:    res.set_exception(asyncio.TimeoutError())

tests/test_client_ws.py
317-
318-                exc = EofStream()
319:                resp._reader.set_exception(exc)
--
349-
350-                exc = ValueError()
351:                resp._reader.set_exception(exc)
--
467-
468-                exc = ValueError()
469:                resp._reader.set_exception(exc)

tests/test_worker.py
248-        worker.alive = False
249-        assert waiter is not None
250:        waiter.set_exception(RuntimeError())

tests/test_client_response.py
55-    connection.protocol = aiohttp.DataQueue(loop)
56-    connection.protocol.set_response_params = mock.Mock()
57:    connection.protocol.set_exception(http.HttpProcessingError())
--
234-    content = response.content = mock.Mock()
235-    content.read.return_value = loop.create_future()
236:    content.read.return_value.set_exception(ValueError)

tests/test_helpers.py
697-
698-
699:# ------------- set_result / set_exception ----------------------
--
717-async def test_set_exception(loop) -> None:
718-    fut = loop.create_future()
719:    helpers.set_exception(fut, RuntimeError())
--
725-    fut = loop.create_future()
726-    fut.cancel()
727:    helpers.set_exception(fut, RuntimeError())

aiohttp/base_protocol.py
77-            waiter.set_result(None)
78-        else:
79:            waiter.set_exception(exc)

aiohttp/streams.py
15-
16-from .base_protocol import BaseProtocol
17:from .helpers import BaseTimerContext, TimerNoop, set_exception, set_result
--
147-        return self._exception
148-
149:    def set_exception(self, exc: BaseException) -> None:
--
154-        if waiter is not None:
155-            self._waiter = None
156:            set_exception(waiter, exc)
--
159-        if waiter is not None:
160-            self._eof_waiter = None
161:            set_exception(waiter, exc)
--
500-        return None
501-
502:    def set_exception(self, exc: BaseException) -> None:
--
575-        return self._exception
576-
577:    def set_exception(self, exc: BaseException) -> None:
--
582-        if waiter is not None:
583-            self._waiter = None
584:            set_exception(waiter, exc)

aiohttp/http_websocket.py
306-        except Exception as exc:
307-            self._exc = exc
308:            self.queue.set_exception(exc)

aiohttp/client_proto.py
12-from .helpers import (
13-    BaseTimerContext,
14:    set_exception,
--
82-
83-        if exc is not None:
84:            set_exception(self.closed, exc)
--
96-            except Exception:
97-                if self._payload is not None:
98:                    self._payload.set_exception(
--
107-            # assigns self._should_close to True as side effect,
108-            # we do it anyway below
109:            self.set_exception(exc)
--
129-        self._reschedule_timeout()
130-
131:    def set_exception(self, exc: BaseException) -> None:
132-        self._should_close = True
133-        self._drop_timeout()
134:        super().set_exception(exc)
--
207-    def _on_read_timeout(self) -> None:
208-        exc = ServerTimeoutError("Timeout on reading data from socket")
209:        self.set_exception(exc)
210-        if self._payload is not None:
211:            self._payload.set_exception(exc)
--
242-                        self.transport.close()
243-                    # should_close is True after the call
244:                    self.set_exception(exc)

aiohttp/client_reqrep.py
570-        except OSError as exc:
571-            if exc.errno is None and isinstance(exc, asyncio.TimeoutError):
572:                protocol.set_exception(exc)
--
577-                new_exc.__context__ = exc
578-                new_exc.__cause__ = exc
579:                protocol.set_exception(new_exc)
--
581-            await writer.write_eof()
582-        except Exception as exc:
583:            protocol.set_exception(exc)
--
1021-        # content can be None here, but the types are cheated elsewhere.
1022-        if content and content.exception() is None:  # type: ignore[truthy-bool]
1023:            content.set_exception(ClientConnectionError("Connection closed"))

aiohttp/http_parser.py
438-                except BaseException as exc:
439-                    if self.payload_exception is not None:
440:                        self._payload_parser.payload.set_exception(
--
442-                        )
443-                    else:
444:                        self._payload_parser.payload.set_exception(exc)
--
823-                                chunk[:pos].decode("ascii", "surrogateescape")
824-                            )
825:                            self.payload.set_exception(exc)
--
926-            self.decompressor = ZLibDecompressor(encoding=encoding)
927-
928:    def set_exception(self, exc: BaseException) -> None:
929:        self.out.set_exception(exc)

aiohttp/web_ws.py
536-    def _cancel(self, exc: BaseException) -> None:
537-        if self._reader is not None:
538:            self._reader.set_exception(exc)

aiohttp/_http_parser.pyx
766-    except BaseException as exc:
767-        if pyparser._payload_exception is not None:
768:            pyparser._payload.set_exception(pyparser._payload_exception(str(exc)))
769-        else:
770:            pyparser._payload.set_exception(exc)

aiohttp/web_request.py
814-
815-    def _cancel(self, exc: BaseException) -> None:
816:        self._payload.set_exception(exc)

aiohttp/web_protocol.py
578-                        self.close()
579-
580:                payload.set_exception(PayloadAccessError())

aiohttp/helpers.py
798-
799-
800:def set_exception(fut: "asyncio.Future[_T]", exc: BaseException) -> None:
801-    if not fut.done():
802:        fut.set_exception(exc)

I was running into this rather frequently when downloading larger files, like PDFs. I'm using iter_chunked() with custom decompression logic. I was able to mitigate it to a point it's no more troublesome by increasing the chunk size from 1 kB to 1 MB. After this change, the issue disappeared almost completely.

I suggest trying this out until a real fix is released.

Also, I suggest adding some recommendation in docs on how to select proper chunk size as there really is no guidance beyond the example code.

In my case, I was not only able to get rid of the issue but it made my requests like 10 times faster for larger files. I guess the example 1024 is too small even for common HTML documents.

I made #8089 (extension/follow-up to #8049) to try make errors set on the futures better traceable.

I can reliably reproduce at least one case of this. In case it's helpful for anyone debugging this, here's some details so far.

If I modify connection_lost in client_proto.py to include the full exception like so:

        if self._parser is not None:
            try:
                uncompleted = self._parser.feed_eof()
            except Exception as e:
                if self._payload is not None:
                    self._payload.set_exception(
                        e#ClientPayloadError("Response payload is not completed")
                    )

Then I get the following exception in the app code:

    |   File "<redacted>/python3.11/site-packages/aiohttp/client_proto.py", line 82, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
    | aiohttp.http_exceptions.ContentLengthError: 400, message:
    |   Not enough data for satisfy content length header.

Some additional bits of info:

  • TCP dump confirms the other side is not closing the connection
  • My repro is clearly racy - it ends up here 9 times out of 10, but sometimes completes without error
  • If I set a breakpoint or add a print-statement in the close method in client_proto.py the problem goes away.. but also nothing gets printed and the breakpoint never triggers?

Update: I think I was using the API incorrectly? I didn't realize ClientSession.request returned a context manager, I just used regular await on it. My original - failing with the issue in this ticket - code looked like so:

async def example(http: aiohttp.ClientSession, headers, data):
    res = await http.request("POST", "http://example.com",
                        headers=headers, data=data)
    body = await res.read()
    return body

changing it to use the context manager seems to have resolved the issue for me:

async def example(http: aiohttp.ClientSession, headers, data):
    async with http.request("POST", "http://example.com",
                        headers=headers, data=data) as res:
        body = await res.read()
        return body

I am pretty sure both versions of that code should do the same thing...
Would be great if you could figure out a full test in a PR to reproduce the issue.
Note we've also got a PR this week which will add the underlying exception information in future versions of aiohttp, so you won't need to modify the code in future to figure that out.

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

@jakewins if you want to get more details before my PR is merged and released, you could apply https://github.com/aio-libs/aiohttp/pull/8089.diff which extends what's revealed in tracebacks.

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the .read() call.

@Dreamsorcerer aye; looks like you're right, changing the prod code to use the context manager did not make the error go away, though it seems to have reduced the rate a lot, maybe 80% or so, but it's still happening.

The repro I have depends on running one of our services locally and letting it make this specific call asking for a large payload into our infrastructure via VPN, so I can't share it as-is, but I will try next week and create a free-standing repro

Hey everyone, I just released v3.9.4rc0. It'll hopefully help us get more complete tracebacks and that in turn, will let us know the underlying causes for this exception being raised โ€” there may be several distinct behaviors underneath and we need to discover what they are.

If you have ability to do so, please try out this pre-release in environments where you can reproduce your problems and come back with the logs with complete tracebacks. Let's figure this out together!

@jakewins @xaralis @rlippmann @lmazuel @ShantanuNair @TheOnlyWayUp @gelodefaultbrain @junbaibai0719 @joelcorporan @aralroca @a-gn @tharradine @tildedave @WH-2099 @pablogsal @hauntsaninja

Meet the same problem. I use aiohttp to request sub URL in flask async view function. About 10% requests in interface pressure testing raise this issue. There is no error if testing one by one.

@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.

Interestingly, ever since I deployed the pre-release, I stopped seeing this problem where it was happening ๐Ÿคฏ

This means we'll have to rely on people here to try to reproduce it in their environments. Just remember to try out v3.9.4rc0 to get useful information about the causal chain.

@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.

My project is using python3.7, which can not adapt to 3.9.4rc0. Is 4.0.0a0 or 4.0.0a1 fix this issue? I will have a try.

Python 3.7 itself is not supported and may have known security issues. We won't put any effort into supporting a version of Python which is not itself supported in any way, so you'll need to upgrade to a more recent version to test anything or get any support from us.

Is 4.0.0a0 or 4.0.0a1 fix this issue?

No, those are very old pre-releases for old versions of the master branch. Don't use them.

@Dreamsorcerer @webknjaz
Thanks for your reply. 3.9.4rc0 works for me with python3.8!
Now, it raise Timeout error which is reasonable in pressure testing tasks.

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the .read() call.

I'm not sure if this is relevant here, but maybe the thing to bear in mind is that, when calling stuff like transport.close(), a lot of the callbacks that happen as a result of that are happening after an asyncio.call_soon(), so some things you may expect to be true after transport.close() returns may not yet be true?

https://github.com/python/cpython/blob/c951e25c24910064a4c8b7959e2f0f7c0d4d0a63/Lib/asyncio/selector_events.py#L857-L865

I have a reproducer:

  + Exception Group Traceback (most recent call last):
  |   File "/var/src/aiohttp/test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/var/src/aiohttp/test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    |
    | The above exception was the direct cause of the following exception:
    |
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Unfortunately i accidentally deleted my fugazi server, give me a moment while i rewrite it.

https://gist.github.com/gtedesco-r7/0276fc7a00ebe568afc7132ce64b011c

It explodes for me after a couple of minutes:

<<< 8327 bytes
  + Exception Group Traceback (most recent call last):
  |   File "/var/src/aiohttp/test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/var/src/aiohttp/test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    |
    | The above exception was the direct cause of the following exception:
    |
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
    +------------------------------------

real    1m50.561s
user    0m48.431s
sys     0m7.836s
HttpPayloadParser::feed_eof() - PARSE_CHUNKED
<class 'aiohttp.streams.StreamReader'> <StreamReader 82000 bytes>
155728 82000

Yeah, feed_eof() is being called while StreamReader still has 82000 bytes in its buffer. Seems like that should be impossible, right?

Weird thing is, even my little bit of debugging seems to turn it into a TimeoutError, I had to get rid of the read timeout in order to get it to reliably crash in the usual way.

I don't remember the details, if you think you can dig into the code and figure out exactly what's going wrong, that'd be great.

@gtedesco-r7 is that with version 3.9.4rc0? Pure-python or Cython/C-extension?

I still haven't gotten any instances of this exception in my deployments since Feb 16.

    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception

@gtedesco-r7 this line number correlates with code placement in an older aiohttp 3.9.3. Could you check your repro against 3.9.4rc0?

@gtedesco-r7 your gist doesn't reproduce the problem for me under Python 3.11. Neither aiohttp 3.9.3, nor 3.9.4rc0 breaks like that. Though, it appears you might be using a pure-Python implementation of the parser and not the Cython-based one that's default when available.

@gtedesco-r7 based on the above observation, your specific case should be reproducible with AIOHTTP_NO_EXTENSIONS=1 set but I'm still unable to confirm it so far.

UPD: it took 6m 13s on my machine and the reproducer in the gist produced a similar exception with Python 3.11, aiohttp 3.9.3 and AIOHTTP_NO_EXTENSIONS=1.

UPD: it took 1m 21s of running the repro with Python 3.11, aiohttp 3.9.4rc0 and AIOHTTP_NO_EXTENSIONS=1 to get the following (more detailed) traceback:

  + Exception Group Traceback (most recent call last):
  |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 190, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/taskgroups.py", line 147, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 279, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 793, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    | 
    | The above exception was the direct cause of the following exception:
    | 
    | Traceback (most recent call last):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 357, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
    +------------------------------------

And on the server side:

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-19080' coro=<fugazi_web() done, defined at ~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py:8> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py", line 13, in fugazi_web
    req = await rd.read(8192)
          ^^^^^^^^^^^^^^^^^^^
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 694, in read
    await self._wait_for_data('read')
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 526, in _wait_for_data
    await self._waiter
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/selector_events.py", line 995, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

The Cython-based implementation still doesn't crash with that snippet. So it's either fast enough for the race not to happen, or the implementation doesn't have the same bug as pure-python.

Looking into the source, it's clear that connection_lost() was called on a clean connection close (with exc=None, that is). Then inside, it's maybe calling self._payload_parser.feed_eof() which immediately suppresses any exceptions โ€” this first call under if self._payload_parser is not None: would've raised a <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'> but this remains hidden. Looking into https://github.com/aio-libs/aiohttp/blob/v3.9.4rc0/aiohttp/http_parser.py#L793, it's obvious that this would happen only if the HttpPayloadParser() instance is initialized with chunked = True โ€” this is because self._type = ParseState.PARSE_CHUNKED is only set once in the initialized and is not mutated later. Any such calls to HttpPayloadParser.feed_eof() would end up like this, for objects initialized for chunked parsing. So the expectation is that it shouldn't really be called in the chunked mode.

Back to connection_lost(), it then proceeds to call self._parser.feed_eof() under self._parser.feed_eof(). self._parser.feed_eof() itself also makes a call to the same HttpPayloadParser.feed_eof() at the very beginning and should set the internal attribute self._payload_parser = None which never happens because again feed_eof() explodes one line earlier. It bubbles up back to our connection_lost(), but this time it's not suppressed silently โ€” it's reraised as ClientPayloadError() if self._payload is not None: even though the network connection is clean (the exception message would contain extra sentence if it wasn't).

So this means that if by the time ResponseHandler.connection_lost() was called, something would set ResponseHandler._payload = None or ResponseHandler._parser = None, this wouldn't cause the exception. Additionally, if we were to suppress TransferEncodingError around that HttpPayloadParser.feed_eof() call in ResponseHandler.connection_lost(), this would cover the case of illegally calling HttpPayloadParser.feed_eof() in chunked mode that always raises TransferEncodingError in chunked mode.

@Dreamsorcerer @bdraco thoughts? ideas?

I couldn't reproduce it after 30 mins. I'd probably need to inspect the raw data going through the parser to understand it properly.

@Dreamsorcerer sometimes, it was non-reproducible at all for me too. But given enough time, it's reproducible with the pure-python implementation. Make sure to have that env var set, just in case. I was also adding a print() to ensure Cython doesn't get selected.

It took about 45 minutes to happen on my system

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/aiohttp/client_proto.py", line 94, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 284, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 799, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    | 
    | The above exception was the direct cause of the following exception:
    | 
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 357, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
    +------------------------------------

@bdraco it was inconsistent for me too. I'm mostly curious what you think of what I wrote in #4581 (comment). What I suggested should work in theory but I'm worried it's only addressing the consequence and not the root cause.

I'm poking at it a bit to see if I can find the root cause but given the long test cycle it may take a while and I'm time limited for the next few days

Looks like sometimes I get

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 393, in read
    |     await self._wait("read")
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 311, in _wait
    |     with self._timer:
    |   File "/Users/bdraco/aiohttp/aiohttp/helpers.py", line 735, in __exit__
    |     raise asyncio.TimeoutError from None
    | TimeoutError
    +------------------------------------

Interesting.. I never got that timeout. But it makes sense given that there seems to be a race between something unsetting ResponseHandler._parser / ResponseHandler._payload and ResponseHandler.connection_lost() being called.

Sorry - was there an official fix for this? I am able to reproduce when utilizing aiohttp through aiobotocore.

Basically:

  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/de_common/aws/s3.py", line 229, in get_object
    body = await response["Body"].read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiobotocore/response.py", line 56, in read
    chunk = await self.__wrapped__.content.read(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
            ^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>

Anyone have any suggestions or thoughts? I'm using aiohttp==3.9.5

I think I have a reliably reproducible example that is derived from @gtedesco-r7's example
https://gist.github.com/toffeegryphon/6a94a7883923f80c2259bbb297bb0d3b

Python 3.12
aiohttp 3.9.3

A ClientPayloadError(TransferEncodingError) is raised when the server closes the connection (or rather the client realizes the connection is closed) before the client reads the end-of-response 0 size chunk, and a ServerDisconnectedError is raised when the server closes the connection after the end-of-response chunk is read (specifically, when a TCP connector is reused). A 104 Peer Disconnected seems to be raised occasionally, I believe it happens when the client realizes the connection is closed while the response buffer is being read, but I haven't looked too closely at whether this is true or not.

Was there traction on this (or one of the other incredibly similar errors)? I suspect I'm getting some form of throttle that is causing things to get weird after spending a while reading all sorts of random posts from folks. I can't seem to get a better error message no matter how I try, probably because one of the libraries in the middle eats it or something.

I'm downloading a 124.5MB file from Google Cloud using the gcloud-aio library and fastapi with a streaming response. And I intermittently get this exception. My download takes roughly 4-5 seconds when working correctly. But if I try it a few times in a row (say like 5 or 6), it errors out instead of working.

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>. SSLError(1, '[SSL] record layer failure (_ssl.c:2580)')

I suspect this is a me problem in some manner, but I can't seem to figure out what issue I'm trying to fix.

Here's my fastapi endpoint:

async def download(request: Request) -> StreamingResponse:
    async def generate_response():
        # we need to make a new session inside this generator
        #  because the one on the request context is closed when we return
        #  the streaming response, and then it goes byebye and fails
        #  the streaming download out with a partial file error.
        async with aiohttp.ClientSession(raise_for_status=True) as blob_session:
            new_storage_client = Storage(session=blob_session)
            async with await new_storage_client.download_stream(bucket, file) as stream:
                yield await stream.read()
    return StreamingResponse(generate_response())

For our situation, we've figured out that yeah, it was legitimately the connection being broken from the server side (I.e. No clear bug from aiohttp). We had Nginx as a load balancer/reverse proxy in front of our servers, and whenever a reload happens, the workers would get killed some time after, and any long running requests would get forcefully terminated (and correctly reported as an error).

@webknjaz @bdraco Think one of you could try the reproducer again on master? I'm suspecting it might be fixed by the chunking issue I sorted out recently.

I can give it a shot this weekend when I'll have the ability to let it run for an extended period (assuming my reattempt at a long travel day tomorrow doesn't go as poorly as my attempt on Wednesday and I actually get to my destination)

Hey! Iโ€™m encountering a similar issue where the connection appears to be terminated on the server, resulting in a ClientPayloadError(TransferEncodingError) on the client side. This occurs rarely and is difficult to reproduce.

Would simply retrying the request on the client side be sufficient to handle this error, or would it be necessary to recreate the aiohttp session (or implement any special error-handling logic)? Will aiohttp attempt to reconnect to the server automatically upon a retry?

The project you've referenced above appears to be using 3.10.3, the fix I mentioned above is in 3.10.4. If you're still experiencing problems on the latest release, you'll have to try and figure out a reproducer or some way to log the raw chunks. However, you should verify whether you are using the compiled C parser (e.g. wheels from pip) or the Python parser (which can be forced with AIOHTTP_NO_EXTENSIONS=1 envvar) first.

If the server is sending malformed responses, then the behaviour appears correct, if you send another request to retry, then aiohttp should close the bad connection and create a new connection for the next request. If the responses are correct, then we'd need to find the cause of the issue here.

@Dreamsorcerer I'm afraid I'm at capacity these days, so I'll defer to @bdraco for now.

I started running the test client again. I'll leave it for a few hours and report back

while python3 4581_client.py; do :; done

No longer reproducible on 3.10.5. I'd say this is fixed

I'm getting this error with the following:

male_doctor = 'Arzt'     # works
FEMALE_DOCTOR = 'ร„rztin' # THROWS

async with aiohttp.ClientSession(
    connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
    cookie_jar=aiohttp.DummyCookieJar(),
    raise_for_status=True,
    timeout=aiohttp.ClientTimeout(sock_connect=10)
) as session:
    async with session.get('https://www.linguee.com/english-german/search',
                           params={'query': FEMALE_DOCTOR}) as rsp:
        txt = await rsp.text()

Server return slightly different responses to a successful and an unsuccessful queries.

Artz has Content-Length and no 'Transfer-Encoding': 'chunked':

<ClientResponse(https://www.linguee.com/english-german/search?query=Arzt) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:12:17 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Content-Length': '24220', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': '0f2ae5c129b94c428e4173bc79e1b87f', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5461', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=363, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=364', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>

ร„rztin doesn't have Content-Length, but has 'Transfer-Encoding': 'chunked':

<ClientResponse(https://www.linguee.com/english-german/search?query=%C3%84rztin) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:17:26 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Transfer-Encoding': 'chunked', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': 'da7d20f6eba742e1b1d801e9ab56b688', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5769', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=440, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=445', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>

May be a server problem, but I have no control over it, and browser and curl manage to parse the response successfully.

I'm on Linux.

Python 3.12.3

Name: aiohttp
Version: 3.10.8

Name: multidict
Version: 6.1.0

Name: yarl
Version: 1.13.1

Stacktrace:

---------------------------------------------------------------------------
TransferEncodingError                     Traceback (most recent call last)
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_proto.py:92, in ResponseHandler.connection_lost(self, exc)
     91 try:
---> 92     uncompleted = self._parser.feed_eof()
     93 except Exception as underlying_exc:

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/_http_parser.pyx:513, in aiohttp._http_parser.HttpParser.feed_eof()

TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

ClientPayloadError                        Traceback (most recent call last)
Cell In[29], line 8
      1 async with aiohttp.ClientSession(
      2         connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
      3         cookie_jar=aiohttp.DummyCookieJar(),
      4         raise_for_status=True,
      5         timeout=aiohttp.ClientTimeout(sock_connect=10)
      6     ) as session:
      7     async with session.get('https://www.linguee.com/english-german/search', params={'query': 'ร„rztin'}) as rsp:
----> 8         txt = await rsp.text()

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1220, in ClientResponse.text(self, encoding, errors)
   1218 """Read response payload and decode."""
   1219 if self._body is None:
-> 1220     await self.read()
   1222 if encoding is None:
   1223     encoding = self.get_encoding()

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1178, in ClientResponse.read(self)
   1176 if self._body is None:
   1177     try:
-> 1178         self._body = await self.content.read()
   1179         for trace in self._traces:
   1180             await trace.send_response_chunk_received(
   1181                 self.method, self.url, self._body
   1182             )

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:386, in StreamReader.read(self, n)
    384 blocks = []
    385 while True:
--> 386     block = await self.readany()
    387     if not block:
    388         break

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:408, in StreamReader.readany(self)
    404 # TODO: should be `if` instead of `while`
    405 # because waiter maybe triggered on chunk end,
    406 # without feeding any data
    407 while not self._buffer and not self._eof:
--> 408     await self._wait("readany")
    410 return self._read_nowait(-1)

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:315, in StreamReader._wait(self, func_name)
    313 try:
    314     with self._timer:
--> 315         await waiter
    316 finally:
    317     self._waiter = None

ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

The server closes the connection without completing the message.
Browsers atleast tend to render data as it's received, without worrying so much about whether it's received the full message or not.

If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete. i.e. There is no way for us to know if what we received is a valid message or only part of one (or, strictly speaking, it is only part of a message).

I don't see it working with curl though:

> curl 'https://www.linguee.com/english-german/search?query=%C3%84rztin'
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
> curl 'https://www.linguee.com/english-german/search?query=ร„rztin'
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

Warning: Binary output can mess up your terminal

Had the same response for both Arzt and ร„rztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned. I haven't played with command line options, probably there was some way to decode the binary output.

Apparently they've fixed the server: can't reproduce a problem anymore. curl also print HTML to the console.

If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete.

Would be appreciated, apparently the problem pop-ups from time to time. May be change ClientPayloadError: Response payload is not completed to a specific ClientIncompleteResponse: The server closed the connection without completing the message, and then make the partial response available? Someone can catch ClientIncompleteResponse and get whatever was received in rsp.text().

Had the same response for both Arzt and ร„rztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned.

Huh, I didn't test curl with the other one. But, I was pretty sure that it was producing binary output because the message was incomplete. The binary output was the raw brotli encoded bytes, i.e. curl didn't attempt to decompress the response.

May be change ClientPayloadError: Response payload is not completed to a specific ClientIncompleteResponse: The server closed the connection without completing the message, and then make the partial response available? Someone can catch ClientIncompleteResponse and get whatever was received in rsp.text().

If you're using a .read() variant that loads the whole body into memory anyway, then that could possibly work. If someone wants to try and implement that, we'll review it. But, given it is literally invalid HTTP behaviour and very rarely encountered, I doubt any maintainers will volunteer to work on it.