Received "Response payload is not completed" when reading response
Closed this issue ยท 88 comments
๐ Describe the bug
A ClientPayloadError
: Response payload is not completed
exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl
succeeds all the time.
๐ก To Reproduce
Do lots of requests against an endpoint of the GitHub API (it happens from time to time).
๐ก Expected behavior
The response is correctly parsed.
๐ Logs/tracebacks
File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
return response.status, response.headers, await response.read()
File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
self._body = await self.content.read()
File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
block = await self.readany()
File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
await self._wait('readany')
File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read
:
blocks = [b'[\n {\n "id": 2941521,\n "node_id": "MDU6TGFiZWwyOTQxNTIx",\n "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n "name": "skip-issue",\n "color": "000000",\n "default": false\n }\n]\n']
n=-1
As you can see the blocks contain the whole payload (the full JSON) but aiohttp
still complains about the payload not being completed.
๐ Your version of the Python
$ python --version
Python 3.8.1
๐ Your version of the aiohttp/yarl/multidict distributions
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp
๐ Additional context
I suspect this has to be with this comment:
Lines 352 to 356 in 1740c99
as this is where is failing and the situation seems that is the one described in the TODO
.
CC: @socketpair @asvetlov as the TODO was added in 5c4cb82
CC: @socketpair @asvetlov as the TODO was added in 5c4cb82
Change the while
to if
seems no help.
Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.
My code streams the proxy body back to upstream, using code roughly like:
response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
await response.write(chunk)
await response.write_eof()
return response
What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.
Investigation:
- Calling
await gunicorn_response.content.wait_eof()
prior to callingiter_chunked
reduces the instance of the error, but it still happens occasionally. - Setting the
Connection: keep-alive
header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me. - Changing the
while
to anif
in the source code in conjunction with callingwait_eof()
prior to iterating over the body resolves the issue.
we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce
I've just experienced the same issue via s3fs, downloading a few hundred files from S3.
Python: 3.7.12
aiohttp: 3.8.1
yarl: 1.7.2
multidict: 5.2.0
Traceback (most recent call last):
File "dataset_io.py", line 336, in download
filesystem.get(self._location, self._download_location, recursive=True)
File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
raise return_result
File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
coros, batch_size=batch_size, callback=callback
File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
await asyncio.gather(*chunk, return_exceptions=return_exceptions),
File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
return await fut
File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
chunk = await body.read(2 ** 16)
File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
chunk = await self.__wrapped__.read(amt if amt is not None else -1)
File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
await self._wait("read")
File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.
I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.
My server failed around the same time as the error. The backoff
library re-ran the request, and I got the ClientPayloadError
. Is this expected? Shouldn't losing the server raise ConnectionError
instead?
I have the same issue with Python 3.10
Same error here!
I've just experienced the same issue via s3fs, downloading a few hundred files from S3.
Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0
Traceback (most recent call last): File "dataset_io.py", line 336, in download filesystem.get(self._location, self._download_location, recursive=True) File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper return sync(self.loop, func, *args, **kwargs) File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync raise return_result File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner result[0] = await coro File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get coros, batch_size=batch_size, callback=callback File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks await asyncio.gather(*chunk, return_exceptions=return_exceptions), File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for return await fut File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file chunk = await body.read(2 ** 16) File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read chunk = await self.__wrapped__.read(amt if amt is not None else -1) File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read await self._wait("read") File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait await waiter aiohttp.client_exceptions.ClientPayloadError: Response payload is not completedSorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.
I am also getting that error using iter_chunked when downloading files from S3
Every time I download the last few files, this error message appears
Hi, is there any fix for this? Appreciate it.
My services occasionally crash with this error, Python 3.10.12
and Name: aiohttp Version: 3.8.5
.
What are our obstacles in isolating this issue, before we can fix it?
I'll ensure I hold on to the logs the next time it arises, do we know what's causing it?
Thanks!
What are our obstacles in isolating this issue, before we can fix it?
Not sure. Maybe we can figure out a message that causes the problem, or maybe it needs multiple steps to trigger..
It looks like it might be related to #5238, so maybe we can get that PR updated and merged.
I'll try that PR out, will report if something breaks
@Dreamsorcerer @TheOnlyWayUp Did you guys seem to figure this out? May be a session setting issue wrt stale connections. I am running into this issue with langchainjs which uses openAI's python client.
All discussion is above, so feel free to follow up. PR #5238 could be worked on to get it merged, or if you can figure out a test (that runs in our CI) to reproduce the issue, then that would be a great help.
I started to have this issue with 3.8.6 on pypy 3.9.18 consistently on CI, 80% of the time on my local box for a very specific streaming unittest (so I can trigger it quite easily with a simple pytest run). It was working fine on 3.8.5.
Funny thing is that reading your ChangeLog, I saw the change for lenient parsing in llhttp
and the -X dev
mode that does the parsing differently, so out of ideas, I tried to run pytest with python -X dev -m pytest mytest
, and suddenly the error is gone. If I remove again the -X dev
, the error is back. Strangely, this works great on any other Python version.
Hope this helps somehow.
Stacktrace is consistent with everyone in this thread
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/client_reqrep.py:1043: in read
self._body = await self.content.read()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:375: in read
block = await self.readany()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:397: in readany
await self._wait("readany")
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:304: in _wait
await waiter
/usr/lib/pypy3.9/asyncio/futures.py:284: in __await__
yield self # This tells Task to wait for completion.
/usr/lib/pypy3.9/asyncio/tasks.py:328: in __wakeup
future.result()
I tried to run pytest with
python -X dev -m pytest mytest
, and suddenly the error is gone. If I remove again the-X dev
, the error is back.
Hmm, that's surprising. I wonder if strict parsing is causing the connection to be get closed without raising an error somehow, thus forcing it to open a new connection...
I can confirm this on python 3.11/aiohttp 3.8.6
If I downgrade to 3.8.5 I don't have a problem if I do:
if (method == "GET):
await session.get(....)
if (method == "POST")
await session.post(...)
strangely, if I just do just session.request(method...), I get the error on 3.8.5 as well.
The possible fix (depending on what really is the root cause) is in 3.9.0b1, so please try to reproduce with that.
Nope, still failing.
Then someone will need to do some more debugging and/or figure out a reliable test that we can use to reproduce the behaviour.
Another data point: this is happening to our Octomachinery-based bots (Chronographer and Patchback), which are built on top of gidgethub+aiohttp, just like in the Pablo's traceback.
It's happening under CPython 3.9.13 on the quay.io/fedora/python-39:latest
image with aiohttp 3.6.2. The bots are running Sentry, so it's possible to look a bit deeper into the tracebacks: sanitizers/octomachinery#59.
Upon inspecting them, I noticed that they are related to GET and PATCH requests to the GitHub APIs. Nothing really useful.
Judging from the other hints in this issue, this might be reproducible in slow envs, like with high network latency or PyPy (which is often slower closer to its startup, before the JIT warms up).
Back to gidgethub, it uses aiohttp.ClientSession.request()
โ https://github.com/gidgethub/gidgethub/blob/7474cae/gidgethub/aiohttp.py#L19, which is in line with one of the comments above suggesting that calling dedicated methods like .get()
, .patch()
instead of .request()
might yield a differing behavior.
Anyway, having Sentry in place, we could probably make some educated guesses and instrument my bots to log more things. If anybody has ideas, I'm all ears...
Comparing .get()
, .patch()
and .request()
, I concluded that they are virtually equivalent and haven't changed in through versions. So your @rlippmann observation must be caused by something else. Also note that all of these methods return async context managers and should be used through async with
and not await
, for proper cleanup of the underlying response object.
So the exception is set here:
aiohttp/aiohttp/client_proto.py
Line 99 in 213d1b2
I wonder if one of these tests is a reproducer (when the exception processing is unsuppressed):
aiohttp/tests/test_web_sendfile_functional.py
Lines 951 to 989 in 213d1b2
aiohttp/tests/test_client_functional.py
Lines 1897 to 1913 in 213d1b2
aiohttp/tests/test_web_server.py
Lines 109 to 130 in 213d1b2
So the exception is set here:
aiohttp/aiohttp/client_proto.py
Line 99 in 213d1b2
That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...
) somehow.
That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like
raise ... from ...
) somehow.
Yes, I was also thinking that.
aiohttp/tests/test_web_sendfile_functional.py
Lines 951 to 989 in 213d1b2
When I injected a breakpoint here under the except
block, and modified client_proto.py
with the causing exception details, I extracted this when I ran the specific test with pytest -s
:
(Pdb) pp traceback.print_exc()
Traceback (most recent call last):
File "~/src/github/aio-libs/aiohttp/aiohttp/client_proto.py", line 95, in connection_lost
uncompleted = self._parser.feed_eof()
^^^^^^^^^^^^^^^^^^^^^^^
File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
raise ContentLengthError(
aiohttp.http_exceptions.ContentLengthError: 400, message:
Not enough data for satisfy content length header.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "~/src/github/aio-libs/aiohttp/tests/test_web_sendfile_functional.py", line 985, in test_static_file_huge_cancel
data += await resp.content.read(1024)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/src/github/aio-libs/aiohttp/aiohttp/streams.py", line 347, in read
raise self._exception
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>
None
Hmm, should that exception be ignored, or should we not even be trying to feed EOF?
As far as I can see, that means the connection has been dropped before we've read the full response, and then we try to pretend that we received EOF in order to finish it off?
But, in the context of that test, this is clearly correct, expected behaviour currently:
Reading from the stream may raise aiohttp.ClientPayloadError if the response object is closed before response receives all data or in case if any transfer encoding related errors like malformed chunked encoding of broken compression data.
https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientResponse.content
No idea.. But it think, that test may be a basis for a new one.
FTR this is what I did:
diff --git a/aiohttp/client_proto.py b/aiohttp/client_proto.py
index c42c9540d..cc5916a37 100644
--- a/aiohttp/client_proto.py
+++ b/aiohttp/client_proto.py
@@ -93,11 +93,13 @@ class ResponseHandler(BaseProtocol, DataQueue[Tuple[RawResponseMessage, StreamRe
if self._parser is not None:
try:
uncompleted = self._parser.feed_eof()
- except Exception:
+ except Exception as underlying_exc:
if self._payload is not None:
- self._payload.set_exception(
- ClientPayloadError("Response payload is not completed")
+ reraised_exc = ClientPayloadError(
+ f"Response payload is not completed: {underlying_exc !r}",
)
+ reraised_exc.__cause__ = underlying_exc
+ self._payload.set_exception(reraised_exc)
if not self.is_eof():
if isinstance(exc, OSError):
diff --git a/tests/test_web_sendfile_functional.py b/tests/test_web_sendfile_functional.py
index 6a64f8aa2..ec0c04950 100644
--- a/tests/test_web_sendfile_functional.py
+++ b/tests/test_web_sendfile_functional.py
@@ -966,6 +966,7 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
sock = tr.get_extra_info("socket")
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024)
ret = web.FileResponse(file_path)
+ # task.cancel()
return ret
app = web.Application()
@@ -974,15 +975,21 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
client = await aiohttp_client(app)
resp = await client.get("/")
- assert resp.status == 200
+ # assert resp.status == 200
task.cancel()
- await asyncio.sleep(0)
+ # await asyncio.sleep(0)
+ await asyncio.sleep(10)
data = b""
while True:
try:
data += await resp.content.read(1024)
- except aiohttp.ClientPayloadError:
+ except aiohttp.ClientPayloadError as payload_err:
+ import sys, traceback
+ traceback.print_exc()
+ traceback.print_exception(payload_err.__cause__)
+ breakpoint()
break
+ assert resp.status == 200
assert len(data) < 1024 * 1024 * 20
await resp.release()
+ # task.cancel()
This still produces an error when the task isn't cancelled? That doesn't seem right.
But, the changes to logging should probably be put into a PR regardless.
+ # task.cancel()
This still produces an error when the task isn't cancelled? That doesn't seem right.
No, that's a leftover. There's still the original cancel outside the handler.
But, the changes to logging should probably be put into a PR regardless.
Totally. I haven't gotten to that. Just experimenting with stuff at the moment. Feel free to copy my patch into a PR.
But, the changes to logging should probably be put into a PR regardless.
A more complete PR should probably update aiohttp.helpers.set_exception()
with a new cause
argument (having some sentinel default, because None
has a special meaning). That helper should (maybe?) check if the exception doesn't already have a cause set (not sure).
And all set_exception()
invocations would need to be replaced with the helper. I don't see any calls actually setting causes across the entire project:
$ rg -B2 \\bset_exception\\b
tests/test_connector.py
1646- conn._create_connection = mock.Mock()
1647- conn._create_connection.return_value = loop.create_future()
1648: conn._create_connection.return_value.set_exception(err)
tests/test_streams.py
267- assert b"li" == data
268-
269: stream.set_exception(ValueError())
--
376- assert b"line\n" == data
377-
378: stream.set_exception(ValueError())
--
498- assert b"line" + separator == data
499-
500: stream.set_exception(ValueError("Another exception"))
--
572- assert b"li" == data
573-
574: stream.set_exception(ValueError())
--
644-
645- exc = ValueError()
646: stream.set_exception(exc)
--
652-
653- async def set_err():
654: stream.set_exception(ValueError())
--
672- t.cancel()
673- await asyncio.sleep(0)
674: # The following line fails if set_exception() isn't careful.
675: stream.set_exception(RuntimeError("message"))
--
706- assert b"line\n" == data
707-
708: stream.set_exception(ValueError())
--
734- stream = self._make_one()
735- stream.feed_data(b"line\n")
736: stream.set_exception(ValueError())
--
1070- stream = self._make_one()
1071- exc = RuntimeError()
1072: stream.set_exception(exc)
--
1101- s = streams.EmptyStreamReader()
1102- assert str(s) is not None
1103: assert s.set_exception(ValueError()) is None
--
1206- item = object()
1207- buffer.feed_data(item)
1208: buffer.set_exception(ValueError)
--
1215-
1216- async def test_read_exception(self, buffer: Any) -> None:
1217: buffer.set_exception(ValueError())
--
1223- val = object()
1224- buffer.feed_data(val, 1)
1225: buffer.set_exception(ValueError())
--
1236-
1237- buffer.feed_eof()
1238: buffer.set_exception(ValueError())
--
1245-
1246- exc = ValueError()
1247: buffer.set_exception(exc)
--
1252-
1253- async def set_err():
1254: buffer.set_exception(ValueError())
--
1399-
1400- exc = ValueError()
1401: reader.set_exception(exc)
--
1417-
1418- exc = ValueError()
1419: reader.set_exception(exc)
--
1432- reader.on_eof(on_eof)
1433-
1434: reader.set_exception(ValueError())
tests/test_client_request.py
1017- async def throw_exc():
1018- await asyncio.sleep(0.01)
1019: fut.set_exception(ValueError)
--
1024- await req._writer
1025- # assert conn.close.called
1026: assert conn.protocol.set_exception.called
--
1042- async def throw_exc():
1043- await asyncio.sleep(0.01)
1044: fut.set_exception(inner_exc)
--
1049- await req._writer
1050- # assert connection.close.called
1051: assert conn.protocol.set_exception.called
1052: outer_exc = conn.protocol.set_exception.call_args[0][0]
--
1149- await req.write_bytes(writer, conn)
1150-
1151: assert conn.protocol.set_exception.called
1152: exc = conn.protocol.set_exception.call_args[0][0]
tests/test_web_functional.py
1847- async with resp:
1848- assert resp.status == 200
1849: resp.content.set_exception(RuntimeError())
tests/test_client_functional.py
954- url, headers = server.make_url("/"), {"Connection": "Keep-alive"}
955- resp = await session.get(url, headers=headers)
956: resp.content.set_exception(ValueError())
--
2694- with pytest.raises(RuntimeError):
2695- async with session.get(server.make_url("/")) as resp:
2696: resp.content.set_exception(RuntimeError())
--
2731- with pytest.raises(RuntimeError):
2732- async with cm as resp:
2733: resp.content.set_exception(RuntimeError())
tests/test_web_websocket.py
313- exc = EofStream()
314- res = loop.create_future()
315: res.set_exception(exc)
--
331- ws._reader = mock.Mock()
332- res = loop.create_future()
333: res.set_exception(asyncio.TimeoutError())
tests/test_client_ws.py
317-
318- exc = EofStream()
319: resp._reader.set_exception(exc)
--
349-
350- exc = ValueError()
351: resp._reader.set_exception(exc)
--
467-
468- exc = ValueError()
469: resp._reader.set_exception(exc)
tests/test_worker.py
248- worker.alive = False
249- assert waiter is not None
250: waiter.set_exception(RuntimeError())
tests/test_client_response.py
55- connection.protocol = aiohttp.DataQueue(loop)
56- connection.protocol.set_response_params = mock.Mock()
57: connection.protocol.set_exception(http.HttpProcessingError())
--
234- content = response.content = mock.Mock()
235- content.read.return_value = loop.create_future()
236: content.read.return_value.set_exception(ValueError)
tests/test_helpers.py
697-
698-
699:# ------------- set_result / set_exception ----------------------
--
717-async def test_set_exception(loop) -> None:
718- fut = loop.create_future()
719: helpers.set_exception(fut, RuntimeError())
--
725- fut = loop.create_future()
726- fut.cancel()
727: helpers.set_exception(fut, RuntimeError())
aiohttp/base_protocol.py
77- waiter.set_result(None)
78- else:
79: waiter.set_exception(exc)
aiohttp/streams.py
15-
16-from .base_protocol import BaseProtocol
17:from .helpers import BaseTimerContext, TimerNoop, set_exception, set_result
--
147- return self._exception
148-
149: def set_exception(self, exc: BaseException) -> None:
--
154- if waiter is not None:
155- self._waiter = None
156: set_exception(waiter, exc)
--
159- if waiter is not None:
160- self._eof_waiter = None
161: set_exception(waiter, exc)
--
500- return None
501-
502: def set_exception(self, exc: BaseException) -> None:
--
575- return self._exception
576-
577: def set_exception(self, exc: BaseException) -> None:
--
582- if waiter is not None:
583- self._waiter = None
584: set_exception(waiter, exc)
aiohttp/http_websocket.py
306- except Exception as exc:
307- self._exc = exc
308: self.queue.set_exception(exc)
aiohttp/client_proto.py
12-from .helpers import (
13- BaseTimerContext,
14: set_exception,
--
82-
83- if exc is not None:
84: set_exception(self.closed, exc)
--
96- except Exception:
97- if self._payload is not None:
98: self._payload.set_exception(
--
107- # assigns self._should_close to True as side effect,
108- # we do it anyway below
109: self.set_exception(exc)
--
129- self._reschedule_timeout()
130-
131: def set_exception(self, exc: BaseException) -> None:
132- self._should_close = True
133- self._drop_timeout()
134: super().set_exception(exc)
--
207- def _on_read_timeout(self) -> None:
208- exc = ServerTimeoutError("Timeout on reading data from socket")
209: self.set_exception(exc)
210- if self._payload is not None:
211: self._payload.set_exception(exc)
--
242- self.transport.close()
243- # should_close is True after the call
244: self.set_exception(exc)
aiohttp/client_reqrep.py
570- except OSError as exc:
571- if exc.errno is None and isinstance(exc, asyncio.TimeoutError):
572: protocol.set_exception(exc)
--
577- new_exc.__context__ = exc
578- new_exc.__cause__ = exc
579: protocol.set_exception(new_exc)
--
581- await writer.write_eof()
582- except Exception as exc:
583: protocol.set_exception(exc)
--
1021- # content can be None here, but the types are cheated elsewhere.
1022- if content and content.exception() is None: # type: ignore[truthy-bool]
1023: content.set_exception(ClientConnectionError("Connection closed"))
aiohttp/http_parser.py
438- except BaseException as exc:
439- if self.payload_exception is not None:
440: self._payload_parser.payload.set_exception(
--
442- )
443- else:
444: self._payload_parser.payload.set_exception(exc)
--
823- chunk[:pos].decode("ascii", "surrogateescape")
824- )
825: self.payload.set_exception(exc)
--
926- self.decompressor = ZLibDecompressor(encoding=encoding)
927-
928: def set_exception(self, exc: BaseException) -> None:
929: self.out.set_exception(exc)
aiohttp/web_ws.py
536- def _cancel(self, exc: BaseException) -> None:
537- if self._reader is not None:
538: self._reader.set_exception(exc)
aiohttp/_http_parser.pyx
766- except BaseException as exc:
767- if pyparser._payload_exception is not None:
768: pyparser._payload.set_exception(pyparser._payload_exception(str(exc)))
769- else:
770: pyparser._payload.set_exception(exc)
aiohttp/web_request.py
814-
815- def _cancel(self, exc: BaseException) -> None:
816: self._payload.set_exception(exc)
aiohttp/web_protocol.py
578- self.close()
579-
580: payload.set_exception(PayloadAccessError())
aiohttp/helpers.py
798-
799-
800:def set_exception(fut: "asyncio.Future[_T]", exc: BaseException) -> None:
801- if not fut.done():
802: fut.set_exception(exc)
I was running into this rather frequently when downloading larger files, like PDFs. I'm using iter_chunked()
with custom decompression logic. I was able to mitigate it to a point it's no more troublesome by increasing the chunk size from 1 kB to 1 MB. After this change, the issue disappeared almost completely.
I suggest trying this out until a real fix is released.
Also, I suggest adding some recommendation in docs on how to select proper chunk size as there really is no guidance beyond the example code.
In my case, I was not only able to get rid of the issue but it made my requests like 10 times faster for larger files. I guess the example 1024
is too small even for common HTML documents.
I can reliably reproduce at least one case of this. In case it's helpful for anyone debugging this, here's some details so far.
If I modify connection_lost
in client_proto.py
to include the full exception like so:
if self._parser is not None:
try:
uncompleted = self._parser.feed_eof()
except Exception as e:
if self._payload is not None:
self._payload.set_exception(
e#ClientPayloadError("Response payload is not completed")
)
Then I get the following exception in the app code:
| File "<redacted>/python3.11/site-packages/aiohttp/client_proto.py", line 82, in connection_lost
| uncompleted = self._parser.feed_eof()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
| aiohttp.http_exceptions.ContentLengthError: 400, message:
| Not enough data for satisfy content length header.
Some additional bits of info:
- TCP dump confirms the other side is not closing the connection
- My repro is clearly racy - it ends up here 9 times out of 10, but sometimes completes without error
- If I set a breakpoint or add a print-statement in the
close
method inclient_proto.py
the problem goes away.. but also nothing gets printed and the breakpoint never triggers?
Update: I think I was using the API incorrectly? I didn't realize ClientSession.request
returned a context manager, I just used regular await
on it. My original - failing with the issue in this ticket - code looked like so:
async def example(http: aiohttp.ClientSession, headers, data):
res = await http.request("POST", "http://example.com",
headers=headers, data=data)
body = await res.read()
return body
changing it to use the context manager seems to have resolved the issue for me:
async def example(http: aiohttp.ClientSession, headers, data):
async with http.request("POST", "http://example.com",
headers=headers, data=data) as res:
body = await res.read()
return body
I am pretty sure both versions of that code should do the same thing...
Would be great if you could figure out a full test in a PR to reproduce the issue.
Note we've also got a PR this week which will add the underlying exception information in future versions of aiohttp, so you won't need to modify the code in future to figure that out.
I am pretty sure both versions of that code should do the same thing...
I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.
@jakewins if you want to get more details before my PR is merged and released, you could apply https://github.com/aio-libs/aiohttp/pull/8089.diff which extends what's revealed in tracebacks.
I am pretty sure both versions of that code should do the same thing...
I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.
I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the .read()
call.
@Dreamsorcerer aye; looks like you're right, changing the prod code to use the context manager did not make the error go away, though it seems to have reduced the rate a lot, maybe 80% or so, but it's still happening.
The repro I have depends on running one of our services locally and letting it make this specific call asking for a large payload into our infrastructure via VPN, so I can't share it as-is, but I will try next week and create a free-standing repro
Hey everyone, I just released v3.9.4rc0. It'll hopefully help us get more complete tracebacks and that in turn, will let us know the underlying causes for this exception being raised โ there may be several distinct behaviors underneath and we need to discover what they are.
If you have ability to do so, please try out this pre-release in environments where you can reproduce your problems and come back with the logs with complete tracebacks. Let's figure this out together!
@jakewins @xaralis @rlippmann @lmazuel @ShantanuNair @TheOnlyWayUp @gelodefaultbrain @junbaibai0719 @joelcorporan @aralroca @a-gn @tharradine @tildedave @WH-2099 @pablogsal @hauntsaninja
Meet the same problem. I use aiohttp to request sub URL in flask async view function. About 10% requests in interface pressure testing raise this issue. There is no error if testing one by one.
@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.
Interestingly, ever since I deployed the pre-release, I stopped seeing this problem where it was happening ๐คฏ
This means we'll have to rely on people here to try to reproduce it in their environments. Just remember to try out v3.9.4rc0 to get useful information about the causal chain.
@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.
My project is using python3.7, which can not adapt to 3.9.4rc0. Is 4.0.0a0 or 4.0.0a1 fix this issue? I will have a try.
Python 3.7 itself is not supported and may have known security issues. We won't put any effort into supporting a version of Python which is not itself supported in any way, so you'll need to upgrade to a more recent version to test anything or get any support from us.
Is 4.0.0a0 or 4.0.0a1 fix this issue?
No, those are very old pre-releases for old versions of the master
branch. Don't use them.
@Dreamsorcerer @webknjaz
Thanks for your reply. 3.9.4rc0 works for me with python3.8!
Now, it raise Timeout error which is reasonable in pressure testing tasks.
I am pretty sure both versions of that code should do the same thing...
I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.
I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the
.read()
call.
I'm not sure if this is relevant here, but maybe the thing to bear in mind is that, when calling stuff like transport.close()
, a lot of the callbacks that happen as a result of that are happening after an asyncio.call_soon()
, so some things you may expect to be true after transport.close()
returns may not yet be true?
I have a reproducer:
+ Exception Group Traceback (most recent call last):
| File "/var/src/aiohttp/test.py", line 42, in <module>
| asyncio.run(main())
| File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
| return runner.run(main)
| ^^^^^^^^^^^^^^^^
| File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
| return self._loop.run_until_complete(task)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
| return future.result()
| ^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/test.py", line 37, in main
| async with asyncio.TaskGroup() as g:
| File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
| raise me from None
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
| uncompleted = self._parser.feed_eof()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
| self._payload_parser.feed_eof()
| File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
| raise TransferEncodingError(
| aiohttp.http_exceptions.TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/var/src/aiohttp/test.py", line 15, in do_stuff
| async for chunk in resp.content.iter_chunked(8192):
| File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
| rv = await self.read_func()
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
| raise self._exception
| aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
Unfortunately i accidentally deleted my fugazi server, give me a moment while i rewrite it.
https://gist.github.com/gtedesco-r7/0276fc7a00ebe568afc7132ce64b011c
It explodes for me after a couple of minutes:
<<< 8327 bytes
+ Exception Group Traceback (most recent call last):
| File "/var/src/aiohttp/test.py", line 42, in <module>
| asyncio.run(main())
| File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
| return runner.run(main)
| ^^^^^^^^^^^^^^^^
| File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
| return self._loop.run_until_complete(task)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
| return future.result()
| ^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/test.py", line 37, in main
| async with asyncio.TaskGroup() as g:
| File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
| raise me from None
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
| uncompleted = self._parser.feed_eof()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
| self._payload_parser.feed_eof()
| File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
| raise TransferEncodingError(
| aiohttp.http_exceptions.TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/var/src/aiohttp/test.py", line 15, in do_stuff
| async for chunk in resp.content.iter_chunked(8192):
| File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
| rv = await self.read_func()
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
| raise self._exception
| aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
+------------------------------------
real 1m50.561s
user 0m48.431s
sys 0m7.836s
HttpPayloadParser::feed_eof() - PARSE_CHUNKED
<class 'aiohttp.streams.StreamReader'> <StreamReader 82000 bytes>
155728 82000
Yeah, feed_eof()
is being called while StreamReader
still has 82000
bytes in its buffer. Seems like that should be impossible, right?
Weird thing is, even my little bit of debugging seems to turn it into a TimeoutError
, I had to get rid of the read timeout in order to get it to reliably crash in the usual way.
I don't remember the details, if you think you can dig into the code and figure out exactly what's going wrong, that'd be great.
@gtedesco-r7 is that with version 3.9.4rc0? Pure-python or Cython/C-extension?
I still haven't gotten any instances of this exception in my deployments since Feb 16.
| File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read | raise self._exception
@gtedesco-r7 this line number correlates with code placement in an older aiohttp 3.9.3. Could you check your repro against 3.9.4rc0?
@gtedesco-r7 your gist doesn't reproduce the problem for me under Python 3.11. Neither aiohttp 3.9.3, nor 3.9.4rc0 breaks like that. Though, it appears you might be using a pure-Python implementation of the parser and not the Cython-based one that's default when available.
@gtedesco-r7 based on the above observation, your specific case should be reproducible with AIOHTTP_NO_EXTENSIONS=1
set but I'm still unable to confirm it so far.
UPD: it took 6m 13s on my machine and the reproducer in the gist produced a similar exception with Python 3.11, aiohttp 3.9.3 and AIOHTTP_NO_EXTENSIONS=1
.
UPD: it took 1m 21s of running the repro with Python 3.11, aiohttp 3.9.4rc0 and AIOHTTP_NO_EXTENSIONS=1
to get the following (more detailed) traceback:
+ Exception Group Traceback (most recent call last):
| File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 42, in <module>
| asyncio.run(main())
| File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 190, in run
| return runner.run(main)
| ^^^^^^^^^^^^^^^^
| File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 118, in run
| return self._loop.run_until_complete(task)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
| return future.result()
| ^^^^^^^^^^^^^^^
| File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 37, in main
| async with asyncio.TaskGroup() as g:
| File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/taskgroups.py", line 147, in __aexit__
| raise me from None
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
| uncompleted = self._parser.feed_eof()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 279, in feed_eof
| self._payload_parser.feed_eof()
| File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 793, in feed_eof
| raise TransferEncodingError(
| aiohttp.http_exceptions.TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 15, in do_stuff
| async for chunk in resp.content.iter_chunked(8192):
| File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 50, in __anext__
| rv = await self.read_func()
| ^^^^^^^^^^^^^^^^^^^^^^
| File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 357, in read
| raise self._exception
| aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
+------------------------------------
And on the server side:
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-19080' coro=<fugazi_web() done, defined at ~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py:8> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py", line 13, in fugazi_web
req = await rd.read(8192)
^^^^^^^^^^^^^^^^^^^
File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 694, in read
await self._wait_for_data('read')
File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 526, in _wait_for_data
await self._waiter
File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/selector_events.py", line 995, in _read_ready__data_received
data = self._sock.recv(self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
The Cython-based implementation still doesn't crash with that snippet. So it's either fast enough for the race not to happen, or the implementation doesn't have the same bug as pure-python.
Looking into the source, it's clear that connection_lost()
was called on a clean connection close (with exc=None
, that is). Then inside, it's maybe calling self._payload_parser.feed_eof()
which immediately suppresses any exceptions โ this first call under if self._payload_parser is not None:
would've raised a <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
but this remains hidden. Looking into https://github.com/aio-libs/aiohttp/blob/v3.9.4rc0/aiohttp/http_parser.py#L793, it's obvious that this would happen only if the HttpPayloadParser()
instance is initialized with chunked = True
โ this is because self._type = ParseState.PARSE_CHUNKED
is only set once in the initialized and is not mutated later. Any such calls to HttpPayloadParser.feed_eof()
would end up like this, for objects initialized for chunked parsing. So the expectation is that it shouldn't really be called in the chunked mode.
Back to connection_lost()
, it then proceeds to call self._parser.feed_eof()
under self._parser.feed_eof()
. self._parser.feed_eof()
itself also makes a call to the same HttpPayloadParser.feed_eof()
at the very beginning and should set the internal attribute self._payload_parser = None
which never happens because again feed_eof()
explodes one line earlier. It bubbles up back to our connection_lost()
, but this time it's not suppressed silently โ it's reraised as ClientPayloadError()
if self._payload is not None:
even though the network connection is clean (the exception message would contain extra sentence if it wasn't).
So this means that if by the time ResponseHandler.connection_lost()
was called, something would set ResponseHandler._payload = None
or ResponseHandler._parser = None
, this wouldn't cause the exception. Additionally, if we were to suppress TransferEncodingError
around that HttpPayloadParser.feed_eof()
call in ResponseHandler.connection_lost()
, this would cover the case of illegally calling HttpPayloadParser.feed_eof()
in chunked mode that always raises TransferEncodingError
in chunked mode.
@Dreamsorcerer @bdraco thoughts? ideas?
I couldn't reproduce it after 30 mins. I'd probably need to inspect the raw data going through the parser to understand it properly.
@Dreamsorcerer sometimes, it was non-reproducible at all for me too. But given enough time, it's reproducible with the pure-python implementation. Make sure to have that env var set, just in case. I was also adding a print()
to ensure Cython doesn't get selected.
It took about 45 minutes to happen on my system
+ Exception Group Traceback (most recent call last):
| File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
| asyncio.run(main())
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
| return runner.run(main)
| ^^^^^^^^^^^^^^^^
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
| return self._loop.run_until_complete(task)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
| return future.result()
| ^^^^^^^^^^^^^^^
| File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
| async with asyncio.TaskGroup() as g:
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
| raise me from None
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/Users/bdraco/aiohttp/aiohttp/client_proto.py", line 94, in connection_lost
| uncompleted = self._parser.feed_eof()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 284, in feed_eof
| self._payload_parser.feed_eof()
| File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 799, in feed_eof
| raise TransferEncodingError(
| aiohttp.http_exceptions.TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
| async for chunk in resp.content.iter_chunked(8192):
| File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
| rv = await self.read_func()
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 357, in read
| raise self._exception
| aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
+------------------------------------
@bdraco it was inconsistent for me too. I'm mostly curious what you think of what I wrote in #4581 (comment). What I suggested should work in theory but I'm worried it's only addressing the consequence and not the root cause.
I'm poking at it a bit to see if I can find the root cause but given the long test cycle it may take a while and I'm time limited for the next few days
Looks like sometimes I get
+ Exception Group Traceback (most recent call last):
| File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
| asyncio.run(main())
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
| return runner.run(main)
| ^^^^^^^^^^^^^^^^
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
| return self._loop.run_until_complete(task)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
| return future.result()
| ^^^^^^^^^^^^^^^
| File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
| async with asyncio.TaskGroup() as g:
| File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
| raise me from None
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
| async for chunk in resp.content.iter_chunked(8192):
| File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
| rv = await self.read_func()
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 393, in read
| await self._wait("read")
| File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 311, in _wait
| with self._timer:
| File "/Users/bdraco/aiohttp/aiohttp/helpers.py", line 735, in __exit__
| raise asyncio.TimeoutError from None
| TimeoutError
+------------------------------------
Interesting.. I never got that timeout. But it makes sense given that there seems to be a race between something unsetting ResponseHandler._parser
/ ResponseHandler._payload
and ResponseHandler.connection_lost()
being called.
Sorry - was there an official fix for this? I am able to reproduce when utilizing aiohttp
through aiobotocore
.
Basically:
File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/de_common/aws/s3.py", line 229, in get_object
body = await response["Body"].read()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiobotocore/response.py", line 56, in read
chunk = await self.__wrapped__.content.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 383, in read
block = await self.readany()
^^^^^^^^^^^^^^^^^^^^
File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 405, in readany
await self._wait("readany")
File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>
Anyone have any suggestions or thoughts? I'm using aiohttp==3.9.5
I think I have a reliably reproducible example that is derived from @gtedesco-r7's example
https://gist.github.com/toffeegryphon/6a94a7883923f80c2259bbb297bb0d3b
Python 3.12
aiohttp 3.9.3
A ClientPayloadError(TransferEncodingError)
is raised when the server closes the connection (or rather the client realizes the connection is closed) before the client reads the end-of-response 0 size chunk, and a ServerDisconnectedError
is raised when the server closes the connection after the end-of-response chunk is read (specifically, when a TCP connector is reused). A 104 Peer Disconnected seems to be raised occasionally, I believe it happens when the client realizes the connection is closed while the response buffer is being read, but I haven't looked too closely at whether this is true or not.
Was there traction on this (or one of the other incredibly similar errors)? I suspect I'm getting some form of throttle that is causing things to get weird after spending a while reading all sorts of random posts from folks. I can't seem to get a better error message no matter how I try, probably because one of the libraries in the middle eats it or something.
I'm downloading a 124.5MB file from Google Cloud using the gcloud-aio library and fastapi with a streaming response. And I intermittently get this exception. My download takes roughly 4-5 seconds when working correctly. But if I try it a few times in a row (say like 5 or 6), it errors out instead of working.
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>. SSLError(1, '[SSL] record layer failure (_ssl.c:2580)')
I suspect this is a me problem in some manner, but I can't seem to figure out what issue I'm trying to fix.
Here's my fastapi endpoint:
async def download(request: Request) -> StreamingResponse:
async def generate_response():
# we need to make a new session inside this generator
# because the one on the request context is closed when we return
# the streaming response, and then it goes byebye and fails
# the streaming download out with a partial file error.
async with aiohttp.ClientSession(raise_for_status=True) as blob_session:
new_storage_client = Storage(session=blob_session)
async with await new_storage_client.download_stream(bucket, file) as stream:
yield await stream.read()
return StreamingResponse(generate_response())
For our situation, we've figured out that yeah, it was legitimately the connection being broken from the server side (I.e. No clear bug from aiohttp). We had Nginx as a load balancer/reverse proxy in front of our servers, and whenever a reload happens, the workers would get killed some time after, and any long running requests would get forcefully terminated (and correctly reported as an error).
I can give it a shot this weekend when I'll have the ability to let it run for an extended period (assuming my reattempt at a long travel day tomorrow doesn't go as poorly as my attempt on Wednesday and I actually get to my destination)
Hey! Iโm encountering a similar issue where the connection appears to be terminated on the server, resulting in a ClientPayloadError(TransferEncodingError)
on the client side. This occurs rarely and is difficult to reproduce.
Would simply retrying the request on the client side be sufficient to handle this error, or would it be necessary to recreate the aiohttp session (or implement any special error-handling logic)? Will aiohttp attempt to reconnect to the server automatically upon a retry?
The project you've referenced above appears to be using 3.10.3, the fix I mentioned above is in 3.10.4. If you're still experiencing problems on the latest release, you'll have to try and figure out a reproducer or some way to log the raw chunks. However, you should verify whether you are using the compiled C parser (e.g. wheels from pip) or the Python parser (which can be forced with AIOHTTP_NO_EXTENSIONS=1 envvar) first.
If the server is sending malformed responses, then the behaviour appears correct, if you send another request to retry, then aiohttp should close the bad connection and create a new connection for the next request. If the responses are correct, then we'd need to find the cause of the issue here.
@Dreamsorcerer I'm afraid I'm at capacity these days, so I'll defer to @bdraco for now.
I started running the test client again. I'll leave it for a few hours and report back
while python3 4581_client.py; do :; done
No longer reproducible on 3.10.5. I'd say this is fixed
I'm getting this error with the following:
male_doctor = 'Arzt' # works
FEMALE_DOCTOR = 'รrztin' # THROWS
async with aiohttp.ClientSession(
connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
cookie_jar=aiohttp.DummyCookieJar(),
raise_for_status=True,
timeout=aiohttp.ClientTimeout(sock_connect=10)
) as session:
async with session.get('https://www.linguee.com/english-german/search',
params={'query': FEMALE_DOCTOR}) as rsp:
txt = await rsp.text()
Server return slightly different responses to a successful and an unsuccessful queries.
Artz has Content-Length
and no 'Transfer-Encoding': 'chunked'
:
<ClientResponse(https://www.linguee.com/english-german/search?query=Arzt) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:12:17 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Content-Length': '24220', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': '0f2ae5c129b94c428e4173bc79e1b87f', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5461', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=363, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=364', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>
รrztin doesn't have Content-Length
, but has 'Transfer-Encoding': 'chunked'
:
<ClientResponse(https://www.linguee.com/english-german/search?query=%C3%84rztin) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:17:26 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Transfer-Encoding': 'chunked', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': 'da7d20f6eba742e1b1d801e9ab56b688', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5769', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=440, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=445', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>
May be a server problem, but I have no control over it, and browser and curl manage to parse the response successfully.
I'm on Linux.
Python 3.12.3
Name: aiohttp
Version: 3.10.8
Name: multidict
Version: 6.1.0
Name: yarl
Version: 1.13.1
Stacktrace:
---------------------------------------------------------------------------
TransferEncodingError Traceback (most recent call last)
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_proto.py:92, in ResponseHandler.connection_lost(self, exc)
91 try:
---> 92 uncompleted = self._parser.feed_eof()
93 except Exception as underlying_exc:
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/_http_parser.pyx:513, in aiohttp._http_parser.HttpParser.feed_eof()
TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
The above exception was the direct cause of the following exception:
ClientPayloadError Traceback (most recent call last)
Cell In[29], line 8
1 async with aiohttp.ClientSession(
2 connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
3 cookie_jar=aiohttp.DummyCookieJar(),
4 raise_for_status=True,
5 timeout=aiohttp.ClientTimeout(sock_connect=10)
6 ) as session:
7 async with session.get('https://www.linguee.com/english-german/search', params={'query': 'รrztin'}) as rsp:
----> 8 txt = await rsp.text()
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1220, in ClientResponse.text(self, encoding, errors)
1218 """Read response payload and decode."""
1219 if self._body is None:
-> 1220 await self.read()
1222 if encoding is None:
1223 encoding = self.get_encoding()
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1178, in ClientResponse.read(self)
1176 if self._body is None:
1177 try:
-> 1178 self._body = await self.content.read()
1179 for trace in self._traces:
1180 await trace.send_response_chunk_received(
1181 self.method, self.url, self._body
1182 )
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:386, in StreamReader.read(self, n)
384 blocks = []
385 while True:
--> 386 block = await self.readany()
387 if not block:
388 break
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:408, in StreamReader.readany(self)
404 # TODO: should be `if` instead of `while`
405 # because waiter maybe triggered on chunk end,
406 # without feeding any data
407 while not self._buffer and not self._eof:
--> 408 await self._wait("readany")
410 return self._read_nowait(-1)
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:315, in StreamReader._wait(self, func_name)
313 try:
314 with self._timer:
--> 315 await waiter
316 finally:
317 self._waiter = None
ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
The server closes the connection without completing the message.
Browsers atleast tend to render data as it's received, without worrying so much about whether it's received the full message or not.
If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete. i.e. There is no way for us to know if what we received is a valid message or only part of one (or, strictly speaking, it is only part of a message).
I don't see it working with curl though:
> curl 'https://www.linguee.com/english-german/search?query=%C3%84rztin'
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
> curl 'https://www.linguee.com/english-german/search?query=รrztin'
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
Warning: Binary output can mess up your terminal
Had the same response for both Arzt and รrztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned. I haven't played with command line options, probably there was some way to decode the binary output.
Apparently they've fixed the server: can't reproduce a problem anymore. curl also print HTML to the console.
If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete.
Would be appreciated, apparently the problem pop-ups from time to time. May be change ClientPayloadError: Response payload is not completed
to a specific ClientIncompleteResponse: The server closed the connection without completing the message
, and then make the partial response available? Someone can catch ClientIncompleteResponse
and get whatever was received in rsp.text()
.
Had the same response for both Arzt and รrztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned.
Huh, I didn't test curl with the other one. But, I was pretty sure that it was producing binary output because the message was incomplete. The binary output was the raw brotli encoded bytes, i.e. curl didn't attempt to decompress the response.
May be change
ClientPayloadError: Response payload is not completed
to a specificClientIncompleteResponse: The server closed the connection without completing the message
, and then make the partial response available? Someone can catchClientIncompleteResponse
and get whatever was received inrsp.text()
.
If you're using a .read() variant that loads the whole body into memory anyway, then that could possibly work. If someone wants to try and implement that, we'll review it. But, given it is literally invalid HTTP behaviour and very rarely encountered, I doubt any maintainers will volunteer to work on it.