tigerbeetle/tigerbeetle

lsm unit/fuzz testing

jamii opened this issue · 0 comments

jamii commented
  • binary_search
  • bloom_filter
    • no false negatives
    • test false positive rate (requires calculating the expected rate, which is hard without poisson distribution in stdlib)
  • k_way_merge
  • segmented_array
  • set_associative_cache
  • node_pool
  • superblock (done in #224)
  • superblock manifest encode/decode
  • superblock free set (done)
  • superblock client table encode/decode
    • (hold off on testing this for now; client table is going to be restructured as part of #212 - @sentientwaffle)
  • grid (depends on superblock, fifo, iops, set_associative_cache)
  • manifest (depends on grid, manifest_log, manifest_level, node_pool)
  • manifest level (depends on segmented_array, node_pool)
  • manifest log (done in #224)
  • table (depends on grid (BlockType), manifest (TableInfoType))
  • table_immutable (depends on binary_search, (table?))
  • table_mutable (depends on set_associative_cache, (table?))
  • tree (depends on binary_search, bloom_filter, node_pool, ring_buffer, eytzinger, grid, superblock)
    • get, put, remove, compact
    • checkpoint
    • range queries
    • restarts
    • storage faults
    • concurrent operations
  • multiple trees, checking that storage is deterministic over differences in IO latency and crash/recover-from-checkpoint (maybe use StorageChecker like the VOPR)
  • groove (depends on table, tree, grid, composite_key, node_pool)
    • Run two grooves in parallel (separate Storage, etc). Run the same operations against both. Since the storage latencies are different, IO is reordered — but at each checkpoint (and possibly after each compaction beat) the two "disks" should be byte-for-byte identical.
    • Also check that the superblock manifest & free set are identical.
    • This same "parallel" test should be applied to the Forest as well.
  • posted_groove (depends on table, tree, grid, node_pool)
  • level_iterator (depends on table_iterator, rung_buffer, manifest, grid)
  • table_iterator (depends on ring_buffer, manifest, grid)
  • compaction (depends on grid, manifest, k_way_merge, table_iterator, level_iterator)
  • forest (depends on grid, node_pool, compaction)
    • get, put, compact
    • remove
    • checkpoint
    • secondary index get
    • range queries
    • restarts
    • storage faults
    • concurrent operations
    • verify storage determinism (see "groove" for explanation)
  • calculate upper bound on storage size for given workload, test that we don't exceed this
  • state machine (and indirectly, the forest). Use the VOPR's Workload+auditor, but attached directly to the state machine instead of a VSR client.
  • journal (this isn't LSM, but still important especially around wrapping). @sentientwaffle
  • add a flag to the global allocator that prevents allocation after init

Once we have good fuzz coverage and have fixed all the quickly-found bugs, we'll probably want some vopr-like infrastructure to fuzz in the background and post issues too.