filecoin-project/go-f3

Replace queue of future messages with some other catch-up mechanism

Closed this issue · 5 comments

#124 (for #122) added a queue of messages for future instance, but opens up lots of DOS risk (#12) because they can't be validated. This should be removed before production use.

Brief thoughts about what to do instead:

  • Nodes can queue the messages that they have received and validated for the current instance and make them available for fetching by lagging nodes, until the instance completes. After a decision, the finality certificate can catch a node up.
  • Nodes might be able to skip rounds in a multi-round evaluation by fetching/observing evidence that a strong quorum of other nodes have reached a particular round. The evidence attached to a CONVERGE message is close to this, but not quite right yet. Perhaps we can tweak it to make it self-contained. Then we'll only need refetching of messages for a single round.

An alternative to the in-instance catchup mechanism could be speculatively starting a new instance.
Handling it without power table delay is challenging, as we would need to be able to re-verify all decisions we verified speculatively when the power table is available.
With power table delay, we could passively listen, progressing through stages and rounds until we 1. get confirmation that the last instance was completed successfully and 2. we know what chain we want to vote on.

@Kubuxu and I discussed, and concluded that with a power table delay we don't need speculative execution. We can just check signatures and queue messages. I think it will be safe to drop equivocations at QUALITY too, or in any phase that proceeds from a timeout regardless of messages received.

See #151. These probably need to be done together:

  • Implement delayed power tables, maintaining a list of the table for the most recent N instances
  • Use these power tables, with offset, instead of getting from the host with every new chain?
    • Perhaps receive resulting power table in response to notifying a decision
  • Drop messages for which the power table isn't available (10 instances ahead of current), thus bounding the queue

The host should fetch finality certificates to skip over instances to get within 10 of the currently executing one.

AI captured in #169 - @anorth ok for us to close?

This issue is about messages for future instances, not future rounds in this instance. But it's tightly coupled to #151 so I'll expect scope there and close this one.