benbjohnson/litestream

Manual checkpoint before snapshot loses WAL position

Closed this issue · 0 comments

hifi commented

If a database is not written to by the main application, Litestream can lose its own WAL position by checkpointing before taking a snapshot without writing into the sequence table after. This requires an extremely specific sequence of events to happen.

  1. Start Litestream against a new database
  2. Litestream snapshots the initial generation
  3. Restart Litestream
  4. Litestream snapshots <- this is when the only page in the WAL is checkpointed away
  5. Write to database
  6. Litestream complains that the WAL was overwritten by another process, starts a new generation

The issue happens here:

litestream/replica.go

Lines 484 to 487 in 85ddf32

// Issue a passive checkpoint to flush any pages to disk before snapshotting.
if _, err := r.db.db.ExecContext(ctx, `PRAGMA wal_checkpoint(PASSIVE);`); err != nil {
return info, fmt.Errorf("pre-snapshot checkpoint: %w", err)
}

Normally when checkpointing, Litestream will issue a write to the sequence table to force at least one page to be there after restarting the long read transaction which will ensure that regardless of what checkpoint is done that page will stay in until the next "full" checkpoint is made by Litestream and the sequence table is written again.

In this case the read transaction keeps a lock on the page in the WAL at step 2 but at step 4 the read transaction had been restarted after the page was written during previous incarnation of Litestream which makes it free to be checkpointed from the WAL to the main database. Because snapshot will do that rogue checkpoint it will move it silently out without Litestream realizing it until the next write which checks the last page doesn't match the shadow WAL.

The intention of the code is to call r.db.Checkpoint(ctx, CheckpointModePassive) rather than do it manually as it will do the required write to sequence table and also make sure the database file is as up-to-date as possible before taking the snapshot.