tangle-network/dkg-substrate

[SPEC] Post-mortem 7/14/2023

Opened this issue · 1 comments

Overview

  • Delete debug logs before the previous session always (tangle-network/tangle#207)
  • Delete the last local keys before the previous previous session (or just previous session)
  • Identify how to make Signing set selection intelligent using reputation
  • Identify how to not end up with corrupted DB or at least not corrupt our LocalKey
  • Avoid Substrate offchain database for LocalKey storage, use directly a rocksdb or sql database
  • Identify how to give reputation points and not only take away.
  • Use reputation in signing set selection.

Reputation + Adaptive mechanisms for signer selection

  • Give reputation on successful new key generation because all best_authorities are required to participate.
  • Use handshakes to assign reputation at the networking layer
  • Do frequent handshakes per session (maybe 1 every minute).
  • Use combined reputation (substrate + libp2p dkg gadget + onchain dkg metadata)

Research

  • Doing handshakes on proposed signing sets (getting handshakes from all the peers involved)
  • Generate multiple sets and continue to cycle through them after failures and until success.

Short-term proposal

  1. Generate a random set as we currently are.
  2. Start signing with that set.
  3. Identify timeouts/erring parties and remove them, start building a new signing set deterministically.
  4. Replace erring parties with new parties one at a time.
  5. Start new signing protocol with that new set.
  6. If we haven’t heard from the guy, the only way to sign things with them in the future is to do a successful handshake again with them.

Improvement.

  1. When an error occurs, we select the first signing set that removes the erring party.
  2. We gossip this set to the network, basically "I am going to sign with this set because I know I'm connected to these parties".
  3. Upon receiving this message, nodes can decide to do the same.