Panic after partial commit processing
Tpt opened this issue · 1 comments
Tpt commented
If the I/O fails in the middle of a commit processing it seems it possible to end up with a corrupted database.
Here is a test to reproduce the bug (with the instrumentation
feature enabled):
#[test]
fn test_from_fuzzer() {
crate::set_number_of_allowed_io_operations(20);
let tmp = tempdir().unwrap();
let mut options = Options::with_columns(tmp.path(), 1);
options.columns[0].btree_index = true;
let inner_options = InternalOptions {
create: true,
commit_stages: EnableCommitPipelineStages::CommitOverlay,
..Default::default()
};
let db = Db::open_inner(&options, &inner_options).unwrap();
db.commit::<_, Vec<u8>>(vec![(0, vec![250], Some(vec![0]))]).unwrap();
db.inner.flush_logs(0).unwrap();
db.commit::<_, Vec<u8>>(vec![(0, vec![250], None)]).unwrap();
assert!(db.inner.process_commits().is_err()); // Fails because of instrumentation
crate::set_number_of_allowed_io_operations(usize::MAX);
db.inner.process_commits().unwrap();
drop(db);
Db::open_or_create(&options).unwrap(); // Will panic
}
Log:
[2022-10-04T15:17:34Z DEBUG parity-db] Created value table 00-00
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Inserting into new slot 1
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 1: no_hash
[2022-10-04T15:17:34Z DEBUG parity-db] Opened db Options { path: "/tmp/.tmptW0mVS", columns: [ColumnOptions { preimage: false, uniform: false, ref_counted: false, compression: NoCompression, btree_index: true }], sync_wal: true, sync_data: true, stats: true, salt: None, compression_threshold: {} }, metadata=Metadata { salt: [188, 75, 55, 221, 113, 40, 51, 252, 114, 89, 82, 214, 70, 205, 146, 31, 104, 128, 115, 132, 202, 238, 70, 97, 138, 42, 9, 135, 110, 90, 106, 103], version: 7, columns: [ColumnOptions { preimage: false, uniform: false, ref_counted: false, compression: NoCompression, btree_index: true }] }
[2022-10-04T15:17:34Z DEBUG parity-db] Replay is complete.
[2022-10-04T15:17:34Z DEBUG parity-db] Queued commit 2, 2 bytes
[2022-10-04T15:17:34Z DEBUG parity-db] Queued commit 3, 1 bytes
[2022-10-04T15:17:34Z DEBUG parity-db] Removed 2. Still queued commits 1 bytes
[2022-10-04T15:17:34Z DEBUG parity-db] Processing commit 2, record 1, 2 bytes
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Query slot 1
[2022-10-04T15:17:34Z TRACE parity-db] 0: Inserting new no_hash, size = 1
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Inserting into new slot 2
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 2: no_hash
[2022-10-04T15:17:34Z TRACE parity-db] 0: Inserting new no_hash, size = 26
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Inserting into new slot 3
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 3: no_hash
[2022-10-04T15:17:34Z TRACE parity-db] 0: Replacing no_hash
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 1: no_hash
[2022-10-04T15:17:34Z DEBUG parity-db] Flush: Activated new writer 0
[2022-10-04T15:17:34Z DEBUG parity-db] Removed 1. Still queued commits 0 bytes
[2022-10-04T15:17:34Z DEBUG parity-db] Processing commit 3, record 2, 1 bytes
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Query slot 1
[2022-10-04T15:17:34Z TRACE parity-db] 0: Inserting new no_hash, size = 8
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Inserting into new slot 4
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 4: no_hash
[2022-10-04T15:17:34Z TRACE parity-db] 0: Replacing no_hash
[2022-10-04T15:17:34Z TRACE parity-db] 00-00: Writing slot 1: no_hash
[2022-10-04T15:17:34Z DEBUG parity-db] Finalizing log record 2 (0 index, 3 value)
[2022-10-04T15:17:34Z DEBUG parity-db] Processed commit 3 (record 2), 1 ops, 87 bytes written
[2022-10-04T15:17:34Z DEBUG parity-db] Opened existing log /tmp/.tmptW0mVS/log0, first record_id = 1
[2022-10-04T15:17:34Z DEBUG parity-db] Opened log 0, record 1
[2022-10-04T15:17:34Z DEBUG parity-db] Opened value table 00-00 with 2 entries, entry_size=32
[2022-10-04T15:17:34Z DEBUG parity-db] Opened db Options { path: "/tmp/.tmptW0mVS", columns: [ColumnOptions { preimage: false, uniform: false, ref_counted: false, compression: NoCompression, btree_index: true }], sync_wal: true, sync_data: true, stats: true, salt: None, compression_threshold: {} }, metadata=Metadata { salt: [188, 75, 55, 221, 113, 40, 51, 252, 114, 89, 82, 214, 70, 205, 146, 31, 104, 128, 115, 132, 202, 238, 70, 97, 138, 42, 9, 135, 110, 90, 106, 103], version: 7, columns: [ColumnOptions { preimage: false, uniform: false, ref_counted: false, compression: NoCompression, btree_index: true }] }
[2022-10-04T15:17:34Z DEBUG parity-db] Replay: Activated log reader 0
[2022-10-04T15:17:34Z DEBUG parity-db] Replaying database log 0
[2022-10-04T15:17:34Z DEBUG parity-db] Enacting log 1
thread 'db::tests::test_from_fuzzer' panicked at 'index out of bounds: the len is 1 but the index is 2', src/db.rs:566:49
It seems that during log replay the log reader reads a column number "2" even if there is only a single column. I am not sure what causes the error, the log replay should validate log using the checksum and not read garbage data. I plan to investigate this further in the next few days.
Tpt commented
I have found the cause: the log checksum validation is only done when the END_RECORD
log entry is read. Before that LogReader::next
might return garbage without any explicit error.