dtr-org/unit-e

Finalizer with 2/3 of total deposit stops finalizing after a few epochs

Opened this issue · 1 comments

Describe the bug
Consider the following setup:

  1. There are two finalizers.
  2. The first finalizer holds 2/3 of the total deposit and is able to finalize an epoch alone.
  3. Only the first finalizer casts votes.

After a few finalized epochs finalization fails. For some reason, the votes of the first finalizer become unable to finalize an epoch.

To Reproduce
The following test

#!/usr/bin/env python3

from test_framework.test_framework import UnitETestFramework
from test_framework.util import (
    assert_finalizationstate,
    disconnect_nodes,
    generate_block,
)

ESPERANZA_CONFIG = '-esperanzaconfig={"epochLength": 5}'


class DepositBug(UnitETestFramework):
    def set_test_params(self):
        self.num_nodes = 3
        self.extra_args = [
            [ESPERANZA_CONFIG],
            [ESPERANZA_CONFIG, '-validating=1'],
            [ESPERANZA_CONFIG, '-validating=1'],
        ]
        self.setup_clean_chain = True

    def run_test(self):
        p0, v0, v1 = self.nodes
        self.setup_stake_coins(p0, v0, v1)
        # Leave IBD
        self.generate_sync(p0)

        v0_addr = v0.getnewaddress("", "legacy")
        v0.deposit(v0_addr, 3000)

        v1_addr = v1.getnewaddress("", "legacy")
        v1.deposit(v1_addr, 1500)
        generate_block(p0, count=14)

        disconnect_nodes(p0, v0.index)
        disconnect_nodes(p0, v1.index)
        disconnect_nodes(v0, v1.index)

        generate_block(p0, count=1)
        assert_finalizationstate(p0, {'currentEpoch': 4,
                                      'lastJustifiedEpoch': 2,
                                      'lastFinalizedEpoch': 2,
                                      'validators': 2})

        self.wait_for_vote_and_disconnect(finalizer=v0, node=p0)

        generate_block(p0, count=5)
        assert_finalizationstate(p0, {'currentEpoch': 5,
                                      'lastJustifiedEpoch': 3,
                                      'lastFinalizedEpoch': 3})

        self.wait_for_vote_and_disconnect(finalizer=v0, node=p0)

        generate_block(p0, count=5)
        assert_finalizationstate(p0, {'currentEpoch': 6,
                                      'lastJustifiedEpoch': 4,
                                      'lastFinalizedEpoch': 4})

        self.wait_for_vote_and_disconnect(finalizer=v0, node=p0)

        generate_block(p0, count=5)
        assert_finalizationstate(p0, {'currentEpoch': 7,
                                      'lastJustifiedEpoch': 5,
                                      'lastFinalizedEpoch': 5})



if __name__ == '__main__':
    DepositBug().main()

fails with the error:

Traceback (most recent call last):
  File "test/functional/test_framework/test_framework.py", line 203, in main
    self.run_test()
  File "test/functional/finalization_deposit_bug.py", line 87, in run_test
    'lastFinalizedEpoch': 5})
  File "test/functional/test_framework/util.py", line 313, in assert_finalizationstate
    raise AssertionError('\n\t\t\t\t'.join(errors))
AssertionError: lastJustifiedEpoch: not(4 == 5)
				lastFinalizedEpoch: not(4 == 5)

Expected behavior
Finalizer with 2/3 of total deposit should be able to finalize any epoch if it's the only finalizer which votes.

@Gnappuraz and I reviewed it, and this issue occurs because we want to finalize epoch not when 2/3 of votes are reached but when ">2/3" of votes are reached. The way how it's implemented, there is an equal comparison ">=2/3" but! current total votes don't include the current reward factor; however, the threshold (2/3) includes it which "more or less" turns this ">=" sign into ">" one.

Here we can see https://github.com/dtr-org/unit-e/blob/master/src/esperanza/finalizationstate.cpp#L529-L532 that reward was attributed to m_cur_dyn_deposits and m_prev_dyn_deposits but not to curDynastyVotes or prevDynastyVotes

This implementation we borrowed from Casper, and I think we should change it to have a strict >2/3 comparison and attribute current reward to both, total votes and the threshold. Then we won't have this "weird behavior" when the current reward factor is >0, and now we see that 2/3 of votes don't finalize when they used to finalize. Another issue with the current implementation is that if the reward factor is too large (because there are no finalization for decades*), you'll never able to finalize, even if everyone votes