Event scanning seems to leave node unreachable after node can't reach other node(s) in DKG cohort
Closed this issue · 1 comments
This issue arose during a ritual initiation on lynx. One of the supposed "active" staking providers, no longer has a corresponding running node (staking provider 0x24dbb0BEE134C3773D2C1791d65d99e307Fe86CF
), but gets sampled anyway because it is considered active by TACoChildApplication.
Each node in the cohort tries to reach all other nodes in the cohort when performing round 1 of the protocol. However, since one of the nodes in the cohort is not reachable, other nodes seem to raise an exception during perform_round_1
when trying learn about the not running node (i.e. block_until_specific_nodes_are_known()), and then the event scanning task crashes, and tries to restart itself. The node again tries to learn about this non-running node, and the cycle repeats consistently.
It seems the event scanner task just repeatedly crashes and restarts. This occurs because scan_chunk
throws an exception when nodes in the cohort can't be contacted.
This cycle seems to render the node unreachable i.e. the status page for nodes caught in this loop can't be hit, and porter can't ping the node either.
(0x890069) Scanning events in block range 44811183 - 44812107
performing round 1 of DKG ritual #3 from blocktime 1705343058 with authority 0x3B42d26E19FF860bC4dEbB920DD8caA53F93c600.
Error during event hook: After 60 seconds and 0 rounds, didn't find these 1 nodes: {'0x24dbb0BEE134C3773D2C1791d65d99e307Fe86CF'}
Error during ritual event scanning: Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 66, in handle_errors
self.start(now=True)
File "/usr/local/lib/python3.12/site-packages/nucypher/utilities/task.py", line 28, in start
d = self._task.start(interval=self.INTERVAL, now=now)
File "/usr/local/lib/python3.12/site-packages/twisted/internet/task.py", line 206, in start
self()
File "/usr/local/lib/python3.12/site-packages/twisted/internet/task.py", line 251, in __call__
d = maybeDeferred(self.f, *self.a, **self.kw)
--- <exception caught here> ---
File "/usr/local/lib/python3.12/site-packages/twisted/internet/defer.py", line 209, in maybeDeferred
result = f(*args, **kwargs)
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 60, in run
self.scanner()
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 431, in scan
self.__scan(
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 406, in __scan
result, total_chunks_scanned = self.scanner.scan(start_block, end_block)
File "/usr/local/lib/python3.12/site-packages/nucypher/utilities/events.py", line 343, in scan
actual_end_block, end_block_timestamp, new_entries = self.scan_chunk(current_block, estimated_end_block)
File "/usr/local/lib/python3.12/site-packages/nucypher/utilities/events.py", line 249, in scan_chunk
processed = self.process_event(event=evt, get_block_when=get_block_when)
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 43, in process_event
hook(event, get_block_when)
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 399, in _handle_ritual_event
d = self.__execute_action(ritual_event=ritual_event, timestamp=timestamp)
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 384, in __execute_action
return task()
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/trackers/dkg.py", line 378, in task
self.actions[event_type](timestamp=timestamp, **formatted_kwargs)
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/actors.py", line 426, in perform_round_1
nodes, transcripts = list(zip(*self._resolve_validators(ritual)))
File "/usr/local/lib/python3.12/site-packages/nucypher/blockchain/eth/actors.py", line 307, in _resolve_validators
self.block_until_specific_nodes_are_known(
File "/usr/local/lib/python3.12/site-packages/nucypher/network/nodes.py", line 707, in block_until_specific_nodes_are_known
raise self.NotEnoughTeachers(
nucypher.network.nodes.NotEnoughTeachers: After 60 seconds and 0 rounds, didn't find these 1 nodes: {'0x24dbb0BEE134C3773D2C1791d65d99e307Fe86CF'}
Restarting event scanner task!
Fixed via #3390 .