Unable to start Litecoin via snapshot since upgrading from 0.18.3 to 0.21.3
samclusker opened this issue · 1 comments
Issue:
Running: RPC Node
In 0.18.3 we could take a snapshot of the data disk (separately attached disk) and use that snapshot to deploy a new machine and get us synced faster, pretty standard.
Since upgrading 0.21.3 disk snapshots now fail, with litecoind process requiring a reindex of blocks in all circumstances.
I ended up syncing from 0 and getting a healthy node this way, then snapshotting the disk and attempting to start a fresh node with this snapshot, but it fails. Reverting upgrades with a fresh node and snapshotting resolves the issue. Reporting as a bug since this behaviour is only evident post-upgrade.
Expected Behaviour:
Block rewind and eventual syncing of final blocks missing from the snapshot.
Actual Behaviour:
Blocks rewind several blocks before failing and requiring reindex:
litecoin-1 | 2024-09-21T14:05:46Z Verifying last 24 blocks at level 3
litecoin-1 | 2024-09-21T14:05:46Z [0%]...ERROR: DisconnectBlock(): Failed to disconnect MWEB block
litecoin-1 | 2024-09-21T14:05:46Z ERROR: VerifyDB(): *** irrecoverable inconsistency in block data at 2759027, hash=ed909be0679ff0c2f8ba953a5885d29cbea87ffd4c9fd2dc50311d04b2a1419e
litecoin-1 | 2024-09-21T14:05:46Z : Corrupted block database detected.
litecoin-1 | Please restart with -reindex or -reindex-chainstate to recover.
litecoin-1 | : Corrupted block database detected.
litecoin-1 | Please restart with -reindex or -reindex-chainstate to recover.
litecoin-1 | 2024-09-21T14:05:46Z Aborted block database rebuild. Exiting.
This does indicate potentially final blocks are the issue - On one effort of troubleshooting, I have attempted to remove any block files that were dated just before snapshotting.
Reproducing Issue:
Since upgrade the issue is consistently reproducible - the 0.21.3 main node is fully synced. I have shut down the process to take a consistent snapshot on multiple occasions to check for potential corrupted snapshots but 100% failures on all snapshots.
We're running a GCP VM with attached persistent disk which is mounted at a volume. This disk is snapshotted for use by other nodes.
Version Used and Build Method:
0.21.3 built from source within a Dockerfile: https://github.com/flare-foundation/connected-chains-docker/blob/main/images/litecoind/Dockerfile
System Details
GCP e2 virtual machine using a balanced persistent disk
Ubuntu 22.04 OS
Running with Docker