ethereum/trinity

PeerPool leaked TimeoutError and apparently got stuck, but did not terminate

gsalgado opened this issue · 1 comments

A TimeoutError leaked from the PeerPool, which apparently stopped running after that (as evidenced by lack of emitted logs afterwards), but didn't actually terminate so trinity continued running without actually syncing. (Originally reported in #2018)

�[1m�[33m WARNING  2020-09-04 04:19:18,179       Sync / PeerPool  Event loop blocked or overloaded: delay=2.005s, tasks=74�[0m
�[1m�[31m   ERROR  2020-09-04 04:19:18,181               asyncio  Exception in callback <TaskWakeupMethWrapper object at 0x7fa47b456e90>
handle: <Handle <TaskWakeupMethWrapper object at 0x7fa47b456e90>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 70, in uvloop.loop.Handle._run
  File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 300, in _run_and_manage_task
    await task.run()
  File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 165, in run
    await self.child_manager.run()
  File "/usr/local/lib/python3.7/site-packages/async_service/asyncio.py", line 246, in run
    for _, exc_value, exc_tb in self._errors
trio.MultiError: TimeoutError('Tasks never returned after being cancelled: %s', {<Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>}), TimeoutError('Tasks never returned after being cancelled: %s', {<Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>})

Details of embedded exception 1:

  Traceback (most recent call last):
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 82, in cancel_pending_tasks
      yield
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 54, in wait_first
      done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
    File "/usr/local/lib/python3.7/asyncio/tasks.py", line 389, in wait
      return await _wait(fs, timeout, return_when, loop)
    File "/usr/local/lib/python3.7/asyncio/tasks.py", line 482, in _wait
      await waiter
  concurrent.futures._base.CancelledError

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 300, in _run_and_manage_task
      await task.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 165, in run
      await self.child_manager.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/asyncio.py", line 246, in run
      for _, exc_value, exc_tb in self._errors
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 300, in _run_and_manage_task
      await task.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/asyncio.py", line 35, in run
      await self._async_fn(*self._async_fn_args)
    File "/usr/src/app/trinity/p2p/connection.py", line 133, in run_behaviors
      await wait_first(futures, max_wait_after_cancellation=2)
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 69, in wait_first
      raise done_task.exception()
    File "/usr/local/lib/python3.7/contextlib.py", line 188, in __aexit__
      await self.gen.athrow(typ, value, traceback)
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 113, in cancel_pending_tasks
      raise MultiError(errors)
  concurrent.futures._base.TimeoutError: ('Tasks never returned after being cancelled: %s', {<Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>})

Details of embedded exception 2:

  Traceback (most recent call last):
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 82, in cancel_pending_tasks
      yield
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 54, in wait_first
      done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
    File "/usr/local/lib/python3.7/asyncio/tasks.py", line 389, in wait
      return await _wait(fs, timeout, return_when, loop)
    File "/usr/local/lib/python3.7/asyncio/tasks.py", line 482, in _wait
      await waiter
  concurrent.futures._base.CancelledError

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 300, in _run_and_manage_task
      await task.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 165, in run
      await self.child_manager.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/asyncio.py", line 246, in run
      for _, exc_value, exc_tb in self._errors
    File "/usr/local/lib/python3.7/site-packages/async_service/base.py", line 300, in _run_and_manage_task
      await task.run()
    File "/usr/local/lib/python3.7/site-packages/async_service/asyncio.py", line 35, in run
      await self._async_fn(*self._async_fn_args)
    File "/usr/src/app/trinity/p2p/connection.py", line 133, in run_behaviors
      await wait_first(futures, max_wait_after_cancellation=2)
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 69, in wait_first
      raise done_task.exception()
    File "/usr/local/lib/python3.7/contextlib.py", line 188, in __aexit__
      await self.gen.athrow(typ, value, traceback)
    File "/usr/src/app/trinity/p2p/asyncio_utils.py", line 113, in cancel_pending_tasks
      raise MultiError(errors)
  concurrent.futures._base.TimeoutError: ('Tasks never returned after being cancelled: %s', {<Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>, <Task cancelled coro=<wait_first() done, defined at /usr/src/app/trinity/p2p/asyncio_utils.py:31>>})�[0m
   

I cannot reproduce it, but maybe once we fix #2018 this will go away