ChainIndexer: make epoch validation more comprehensive
rvagg opened this issue · 3 comments
Currently we are just comparing counts of things, assuming that if we have the right number of messages, events and event entries, then the epoch is properly indexed. This isn't necessarily true and it would be good to validate in more detail. Events are particularly hard since there is a lot of data to be fetched and compared. We already have the data from the blockstore though, so that's not an additional expense (we need to load an entire AMT just to get its length so we know what size to compare).
SQLite has some sha3
functionality. It's not clear to me if these are available programatically or if they are only available on the sqlite3
cli. If they are available programatically then we could generate hashes of specific columns in the database and compare that digest to a locally generated digest of the same fields. A select over event fields: emitter, event_index, message (cid?), and the associated event_entry fields: indexed, flags, key, codec, and maybe even value to do the whole lot, and then a sha3 of the results. Attempting to reconstruct this locally would be an interesting exercise.
Unfortunately this would tie us to SQLite functionality, so if we made the database pluggable we'd want to abstract this away somehow so that we can do something similar for another database.
Or, alternatively: just do a bulk select and compare everything in one go.
Another proposal that was raised at one point for this was to recalculate the AMT root of each of the messages' events and compare that to what the receipt says. That way we wouldn't even need to load the AMT from the blockstore and it may end up being more efficient. In theory we should have everything we need to do a lossless reconstruction. The only catch I see is that in ChainIndexer, as in the prior event storage code, we skip over any event where we can't look up the address for the actor ID. I don't know if in practice we have any epochs where this has ever happened, but it's a possibility in the code that such an event won't be stored.
I finally decided to investigate the practicality of AMT comparison: #12571
Still some issues with addresses to resolve, and it's unfortunately slower than just counting, which I didn't expect but I also think some of that may be due to using the RPC from lotus-shed and it may be quicker when done in-process.
Closed by #12632.