ConnectionBusy exception thrown from BeamDownloader
cburgdorf opened this issue · 7 comments
What is wrong?
I've noticed the following exception.
f2b7-40b4-a77a-6ba6d13da819>: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xd4062d@91.13.194.238)> 1c231c61-f2b7-40b4-a77a-6ba6d13da819>�[0m
DEBUG 2020-09-02 12:59:58,686 BeamDownloader Problem downloading nodes from peer, dropping...
Traceback (most recent call last):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 81, in payload_candidates
timeout=total_timeout * NUM_QUEUED_REQUESTS,
File "/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/trinity/trinity/sync/beam/state.py", line 681, in _request_nodes
node_hashes, timeout=self._reply_timeout)
File "/usr/src/app/trinity/trinity/protocol/eth/exchanges.py", line 110, in __call__
timeout,
File "/usr/src/app/trinity/p2p/exchange/exchange.py", line 73, in get_result
timeout,
File "/usr/src/app/trinity/p2p/exchange/manager.py", line 59, in get_result
async for payload in stream.payload_candidates(request, tracker, timeout=timeout):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 85, in payload_candidates
f"Timed out waiting for {self.response_cmd_name} request lock "
p2p.exceptions.ConnectionBusy: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xd4062d@91.13.194.238)> 1c231c61-f2b7-40b4-a77a-6ba6d13da819>
It seems that after that error the client was left in a broken state and the logs where dominated by this warning.
WARNING 2020-09-02 13:09:50,331 BeamDownloader Unexpected p2p err while downloading nodes from ETHPeer (eth, 65) <Session <Node(0xd4062d@91.13.194.238)> 1c231c61-f2b7-40b4-a77a-6ba6d13da819>: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xd4062d@91.13.194.238)> 1c231c61-f2b7-40b4-a77a-6ba6d13da819>
Full debug log:
How can it be fixed
The error suggests that we have multiple code paths making simultaneous NodeDataV65
requests to the same peer. @carver I guess this is happening because state backfill and regular state fetching are going against the same peer. We had a similar discussion when we hit the same error for GetBlockHeader
when we introduced the header backfill. We ended up bumping up NUM_QUEUED_REQUESTS
to allow more concurrent requests but I'm not sure if that is the solution here 🤔
Yeah, there shouldn't be this many simultaneous requests, it's a sign of some other bug. So 👎 on increasing NUM_QUEUED_REQUESTS
. No guesses yet on why you're seeing it, though. In order to trigger it, you usually need to have a peer that's good for a while, and then have it disappear and start timing out to all requests. So any test to try to recreate the situation would probably have to follow that script.
There's a good chance that this error doesn't exist anymore. When I filed the issue I assumed I was testing latest master but due to a bug in my DappNode deployment workflow I ended up on old code. Now my node runs on the latest code and the bug did not happen within the past 18 hours. Closing for now.
Ah no, this is in fact still happening with the latest source.
�[1m�[33m WARNING 2020-09-10 01:38:37,849 BeamDownloader Unexpected p2p err while downloading nodes from ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> bd53557e-3322-4fe3-a664-d458de052586>: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xcbceb7@172.33.0.2)> bd53557e-3322-4fe3-a664-d458de052586>�[0m
DEBUG 2020-09-10 01:38:37,849 BeamDownloader Problem downloading nodes from peer, dropping...
Traceback (most recent call last):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 81, in payload_candidates
timeout=total_timeout * NUM_QUEUED_REQUESTS,
File "/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/trinity/trinity/sync/beam/state.py", line 780, in _request_nodes
node_hashes, timeout=self._reply_timeout)
File "/usr/src/app/trinity/trinity/protocol/eth/exchanges.py", line 110, in __call__
timeout,
File "/usr/src/app/trinity/p2p/exchange/exchange.py", line 73, in get_result
timeout,
File "/usr/src/app/trinity/p2p/exchange/manager.py", line 59, in get_result
async for payload in stream.payload_candidates(request, tracker, timeout=timeout):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 85, in payload_candidates
f"Timed out waiting for {self.response_cmd_name} request lock "
p2p.exceptions.ConnectionBusy: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xcbceb7@172.33.0.2)> bd53557e-3322-4fe3-a664-d458de052586>
DEBUG 2020-09-10 01:38:37,877 QueeningQueue Penalizing ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> bd53557e-3322-4fe3-a664-d458de052586> for 2.00s, for minor infraction
�[1m�[33m WARNING 2020-09-10 01:38:37,904 BeamDownloader Unexpected p2p err while downloading nodes from ETHPeer (eth, 63) <Session <Node(0xd05d87@3.214.216.237)> 4a651c05-7993-445d-a9dc-0ae741ec7b5b>: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xd05d87@3.214.216.237)> 4a651c05-7993-445d-a9dc-0ae741ec7b5b>�[0m
DEBUG 2020-09-10 01:38:37,905 BeamDownloader Problem downloading nodes from peer, dropping...
Traceback (most recent call last):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 81, in payload_candidates
timeout=total_timeout * NUM_QUEUED_REQUESTS,
File "/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/trinity/trinity/sync/beam/state.py", line 780, in _request_nodes
node_hashes, timeout=self._reply_timeout)
File "/usr/src/app/trinity/trinity/protocol/eth/exchanges.py", line 110, in __call__
timeout,
File "/usr/src/app/trinity/p2p/exchange/exchange.py", line 73, in get_result
timeout,
File "/usr/src/app/trinity/p2p/exchange/manager.py", line 59, in get_result
async for payload in stream.payload_candidates(request, tracker, timeout=timeout):
File "/usr/src/app/trinity/p2p/exchange/candidate_stream.py", line 85, in payload_candidates
f"Timed out waiting for {self.response_cmd_name} request lock "
p2p.exceptions.ConnectionBusy: Timed out waiting for NodeDataV65 request lock or connection: Connection-<Session <Node(0xd05d87@3.214.216.237)> 4a651c05-7993-445d-a9dc-0ae741ec7b5b>
Hm, any chance that 0xd05d87
disconnected and reconnected once or a couple times before this happened? I'm trying to think about how we could end up with so many live queries for GetNodeData
.
Since you seem to be getting this more often than me, can you try out #2039 ?
Edit: well maybe not yet. I was a bit over-confident that it would work flawlessly on the first shot 😆
Edit 2: okay, looks resolved now...
Another thing that might be worth considering is just throwing a p2p timeout error instead. I actually think the current exception is showing us important information, but if we can't tackle it in the next week, then I think it's okay to stop emitting it. I think a timeout error is fine in all cases where a ConnectionBusy
would be raised.
Seems to have been resolved according to #2039 (review)
I can confirm that this fixes the
ConnectionBusy
issue 👍