ethereum/trinity

_match_predictive_node_requests_to_peers (daemon task) exits when BeamStateBackfill completes, crashing trinity

gsalgado opened this issue · 3 comments

I'm running two local trinity nodes (connected only to each other on a custom clique network), where one occasionally mines blocks and the other syncs from it. However, the syncing node always crashes when performing an initial sync, with the following traceback:

    INFO  2020-11-03 09:47:00,603     BeamStateBackfill  Downloaded all accounts, storage and bytecode state
    INFO  2020-11-03 09:47:00,615   BaseBodyChainSyncer  Imported block 2 (1 txs) in 0.04 seconds, lagging 0 blocks | 2y0m0w
<Manager[BeamDownloader] flags=SRcfe>: task _match_predictive_node_requests_to_peers[daemon=True] exited with error: Daemon task _match_predictive_node_requests_to_peers[daemon=True] exited
 WARNING  2020-11-03 09:47:02,559        BeamDownloader  Leaving _match_predictive_node_requests_to_peers
 WARNING  2020-11-03 09:47:02,560        BeamDownloader  BeamDownloader; BeamStateBackfill
 WARNING  2020-11-03 09:47:02,560        BeamDownloader  <Manager[BeamDownloader] flags=SRcfe>; <Manager[BeamStateBackfill] flags=SrCFe>
<Manager[BeamSyncer] flags=SRcfe>: task BeamDownloader[daemon=True] exited with error: Daemon task _match_predictive_node_requests_to_peers[daemon=True] exited
<Manager[BeamSyncService] flags=SRcfe>: task BeamSyncer[daemon=False] exited with error: Daemon task _match_predictive_node_requests_to_peers[daemon=True] exited
    INFO  2020-11-03 09:47:02,577            FullServer  TCP Listener finished, cancelling Server
<bound method AsyncioIsolatedComponent._do_run of <trinity.components.builtin.syncer.component.SyncerComponent object at 0x7f844989d610>> raised an unexpected exception
Traceback (most recent call last):
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/asyncio_run_in_process/_child.py", line 205, in run_process
    runner(async_fn, args, to_parent)
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/asyncio_run_in_process/_child.py", line 168, in _run_on_asyncio
    result: Any = loop.run_until_complete(_do_async_fn(async_fn, args, to_parent, loop))
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/asyncio_run_in_process/_child.py", line 160, in _do_async_fn
    return await async_fn_task
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/asyncio_run_in_process/_child.py", line 85, in _handle_coro
    return await coro_task
  File "/home/salgado/src/snakecharmers/trinity/trinity/extensibility/asyncio.py", line 79, in _do_run
    await wait_first(
  File "/home/salgado/src/snakecharmers/trinity/p2p/asyncio_utils.py", line 69, in wait_first
    raise done_task.exception()
  File "/home/salgado/src/snakecharmers/trinity/trinity/components/builtin/syncer/component.py", line 394, in do_run
    await wait_first(tasks, max_wait_after_cancellation=2)
  File "/home/salgado/src/snakecharmers/trinity/p2p/asyncio_utils.py", line 69, in wait_first
    raise done_task.exception()
  File "/home/salgado/src/snakecharmers/trinity/trinity/components/builtin/syncer/component.py", line 402, in launch_sync
    await strategy.sync(
  File "/home/salgado/src/snakecharmers/trinity/trinity/components/builtin/syncer/component.py", line 235, in sync
    await manager.wait_finished()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_generator/_util.py", line 42, in __aexit__
    await self._agen.asend(None)
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/asyncio.py", line 458, in background_asyncio_service
    raise MultiError(
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/base.py", line 324, in _run_and_manage_task
    await task.run()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/base.py", line 169, in run
    await self.child_manager.run()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/asyncio.py", line 255, in run
    raise MultiError(
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/base.py", line 324, in _run_and_manage_task
    await task.run()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/base.py", line 169, in run
    await self.child_manager.run()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/asyncio.py", line 255, in run
    raise MultiError(
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/base.py", line 324, in _run_and_manage_task
    await task.run()
  File "/home/salgado/virtualenvs/trinity/lib/python3.8/site-packages/async_service/asyncio.py", line 37, in run
    raise DaemonTaskExit(f"Daemon task {self} exited")
async_service.exceptions.DaemonTaskExit: Daemon task _match_predictive_node_requests_to_peers[daemon=True] exited

That happens because BeamStateBackfill._run_backfill() completes, causing BeamStateBackfill to be cancelled and thus _match_predictive_node_requests_to_peers() to exit

@carver any suggestions on how to fix this?

Hm, odd that this condition doesn't trigger in test_beam_syncer_backfills_all_state.

Without taking longer to dig deeper, the simplest solution is to just stop the backfill service from exiting completely, even though it stops attempting any backfill activity.

Fixed by #2097 -- I think