[SPEC] Post-mortem 7/14/2023
Opened this issue · 1 comments
drewstone commented
Overview
- Delete debug logs before the previous session always (tangle-network/tangle#207)
- Delete the last local keys before the previous previous session (or just previous session)
- Identify how to make Signing set selection intelligent using reputation
- Identify how to not end up with corrupted DB or at least not corrupt our LocalKey
- Avoid Substrate offchain database for LocalKey storage, use directly a rocksdb or sql database
- Identify how to give reputation points and not only take away.
- Use reputation in signing set selection.
Reputation + Adaptive mechanisms for signer selection
- Give reputation on successful new key generation because all
best_authorities
are required to participate. - Use handshakes to assign reputation at the networking layer
- Do frequent handshakes per session (maybe 1 every minute).
- Use combined reputation (substrate + libp2p dkg gadget + onchain dkg metadata)
Research
- Doing handshakes on proposed signing sets (getting handshakes from all the peers involved)
- Generate multiple sets and continue to cycle through them after failures and until success.
drewstone commented
Short-term proposal
- Generate a random set as we currently are.
- Start signing with that set.
- Identify timeouts/erring parties and remove them, start building a new signing set deterministically.
- Replace erring parties with new parties one at a time.
- Start new signing protocol with that new set.
- If we haven’t heard from the guy, the only way to sign things with them in the future is to do a successful handshake again with them.
Improvement.
- When an error occurs, we select the first signing set that removes the erring party.
- We gossip this set to the network, basically "I am going to sign with this set because I know I'm connected to these parties".
- Upon receiving this message, nodes can decide to do the same.