prysmaticlabs/prysm

Check state root before saving a state in disk

terencechain opened this issue · 4 comments

As we have seen invalid state in the DB even before Electra, this typically marks the death of a node. Invalid state on disk is really hard to recover from, as the node will start with an invalid state and requires manual intervention to delete the state from the DB. This raises the question: how does an invalid state get into the DB in the first place? It should never enter the DB if we perform rigorous checking. Given that saving state on disk is an infrequent and crucial event for safety, I think we should check that the state root is correct and aligns with the state root for the block before saving the state in the DB. This check can be done at the DB level. The downside of this is that it prolongs the save state time and requires extra computation for one state hash tree root and one comparison. However, since saving state typically happens in non-critical parts, like migrating hot to cold storage during background operations or when a node is shut down, this compromise is acceptable. We should strive for maximum safety here. What does everyone think?

Gm, is there any issues related to this issue? I wonder the cases you experienced for following:

| As we have seen invalid state in the DB even before Electra, this typically marks the death of a node.

Can you explain the latency trade off? Are we talking about seconds or microseconds?

Can you explain the latency trade off? Are we talking about seconds or microseconds?

It's one hash tree root of the beacon state, so I suspect 100ms - 500ms pessimistic estimate. But I argue this is not a big deal because we don't / shouldn't be saving beacon state to the DB in hot path anyway

Out-of-band feedback I received suggests that we shouldn't prolong the save state DB time, even if it's an infrequent event, because we can't predict the future. I'll close this issue and explore other alternatives