speedb-io/speedb

Getting an iterator error on db_stress

Closed this issue · 20 comments

How to reproduce:

mkdir /tmp/rocksdb_crashtest_expected_b1cqsg8_
run this command:
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_bits=5.680994457101304 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=0 --delpercent=17 --delrangepercent=5 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=1 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=6 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=5 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --initiate_wbm_flushes=1 --iterpercent=17 --key_len_percent_dist=100 --kill_random_test=888887 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=4 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=0 --num_iterations=60 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=scoped --prefix_size=1 --prefixpercent=6 --prepopulate_block_cache=0 --preserve_internal_time_seconds=36000 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=20 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=70 --stats_dump_period_sec=10 --subcompactions=2 --sync=0 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --total_ram_size=1073741824 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35

run this command:

./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_bits=2 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=2097151 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=1048576 --delpercent=28 --delrangepercent=4 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=0 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --initiate_wbm_flushes=0 --iterpercent=3 --key_len_percent_dist=100 --kill_exclude_prefixes=WritableFileWriter::Append,WritableFileWriter::WriteBuffered --kill_random_test=88889 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=8 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=2 --num_iterations=19 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=default --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=30 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=0 --stats_dump_period_sec=600 --subcompactions=3 --sync=1 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --total_ram_size=536870912 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35

You should get the error similar to this:

2023-12-28T14:37:27.4899757Z Expected state has key 0000000000005982, iterator is at key 0000000000005966
2023-12-28T14:37:27.4901103Z Column family: default, op_logs: S 000000000000597D N SFP 000000000000598F P S 0000000000005986 NPNPP
2023-12-28T14:37:27.4902028Z Verification failed :(

  1. run with the flag kill_random_test=0 this will prevent db_stress from killing the test and it will failed constantly
  2. i used ./ldb scan --from="0x0000000000001080" --to="0x0000000000001100" --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno/ --hex --no_value to get a full list of all the keys in the range ... the expected key is in the database

the relevant dump is:
"0x0000000000001089
0x000000000000108A
0x000000000000108B
0x000000000000108C
0x000000000000108D
0x000000000000108F
0x0000000000001092
0x0000000000001093
0x0000000000001094
0x0000000000001097
0x0000000000001098
0x00000000000010A3
0x00000000000010A4
0x00000000000010A6"

based on the database dump, SFP 00000000000010A0 should return 0000000000001098 and now "NNPPPNNPPPP" (N=next P = Prev) should return 0000000000001093 (4 next and 7 prev = 3 prev ). need to check my logic and if items were added after the iterator started.
@Yuval-Ariel where is the patch that enable output of a trace file ...

Opening the database again pass the verification phase. hence the problem is probably on deciding which entries are part of the iterator.

@hilikspdb 6ba2441
its on by default and the trace files are stored in the expected_values_dir

thanks , anyway the second test fail even when it starts on an empty database .

I have simplified the test:
rm -rf /tmp/rocksdb*/*; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=skip_list --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=35 --delpercent=28 --delrangepercent=4 --prefixpercent=0

this test pass ok with memtable=skip-list and in 2.6 while crushes in main with memtablerep=hash_spdb ... therefore we left we two options : bug in translation or a change in the behaviour of db_stress

here is even simpler version:
rm -rf /tmp/rocksdb*/*; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=hash_spdb --enable_speedb_features=false --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=67 --delpercent=0 --delrangepercent=0 --prefixpercent=0

This commit : 6213b63 seems to introduce the issue

This commit : 6213b63 seems to introduce the issue

@ofriedma - that is a commit that was made before version 2.3.0?

this commit looks a force push commit that was done to the plugin which was remove later on. any way it looks that changes were done to my commit which i didnt do and forced push. also there is an issue with db stress and enable speedb features as i mentioned in the past since the shared options doesnt support hardening. currently there is a wrong way how the db stress handles it. i will attach a branch that handles both cases and you can proceed from there

For 2.7 the error does not appear, so it has been revealed somewhere between 2.7 and main.

@ofriedma - Are you referring to the rebased branch or to the official 2.7?
And please clarify: You have said that you have identified the commit that is causing this, but this commit was made before version 2.3.
I am confused.

official 2.7, It appears that it happened probably after the rebase, before v2.7 no errors(and the commit was there).

@ayulas fixs are at branch /ayula/fix_bug
@ofriedma @hilikspdb pls review them

@erez-speedb
This issue is not related to db_stress enable_speedb_features becuase we can reproduce it without this feature entirely.

Here is a quick description of the bug and what can be done:

  • write starts at time t and finishes at t+n
  • iterators starts at time t+k where k < n
  • seek is done at time t+x
  • db_stress internal locking mechanism prevents the SEEK to start before the end of the write.
  • The skip list iterator uses the mutable copy and the iterator sequence number (so data that were added to the memtable after the new iterator is in)
  • vector and hash-spdb are using an immutable copy (so data is not added)
    NOTE: this is not a bug in the memtable the definition of what to do with data in flight is not clear .
    There are two options to fix it:
  1. fix db_bench so new iterator will also use lock of the region that it is doing seek on
  2. mimic skip-list behaviour add the last vector in the time of seek and use the sequence number to decide if data is in or not.
  1. As i said data that is written while the iterator is created should be in "unkown" state. The issue is that the lock is done during the first seek and not in the creation of the iterator (the lock is done on a range of data). this is a conceptual issue with the db_stress and the best place to fix it is there .
  2. Yes, this is the place to fix
  3. As far as i know there is no support for tailing iterators ..

the issue is a result of the Refresh iterator that by definition is not suppose to be supported in the immutable iterator. (hash_spdb, vectors memtablerep)
i added the supoorted flag to the memtablerep and if it is false we avoid doing refresh in no_batched_ops_stress test
@hilikspdb @ofriedma pls review
ayula/802-refresh-iter

@ofriedma as we discussed i expose the allow refresh pls review
cc249df