jermp/fulgor

GGCAT panicks because assertion `left == right` failed

Closed this issue · 2 comments

Hello,

I was able to run fulgor using a small toy dataset (~100 genomes). But I am unable to run it for a dataset consisting of (>10,000 genomes from various bacterial species). GGCAT panics because of a failed assertion statement. I also tried to run it with a smaller subsample, if the number of genomes is sufficiently low (~1000) it runs without an error. Here is an example with 4,000 genomes:

Command:

build -l reference_list.txt -o index-k29m20g8 -k 29 -m 20 -d tmp_dir -g 64 -t 32 --verbose --meta
2024-08-27 14:36:37: step 1. build colored compacted dBG
about to process 4000 files...
Allocator initialized: mem: 64 GiB chunks: 262144 log2: 18
Started phase: reads bucketing prev stats:
Elaborated 201407 sequences! [9984012071 | 99.81% qb] (2005[2006]/4000 => 50.12%)  ptime: 16.64s gtime: 16.65s
Temp buckets files size: 2.01 MiB
Finished phase: reads bucketing. phase duration: 23.87s gtime: 23.89s
Started phase: kmers merge prev stats:
Processing bucket 46 of [1024[R:8050]]  ptime: 10.40s gtime: 34.28s phase eta: 262s est. tot: 273s
Processing bucket 72 of [1024[R:12400]]  ptime: 20.60s gtime: 44.48s phase eta: 314s est. tot: 335s
thread 'km_comp' panicked at /path/to/fulgor/external/ggcat/libs-crates/parallel-processor-rs/src/buckets/readers/generic_binary_reader.rs:73:17:
assertion `left == right` failed
  left: 1877980
 right: 1877976
stack backtrace:
   0: rust_begin_unwind
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:408:17
   3: core::panicking::assert_failed
   4: <parallel_processor::buckets::readers::generic_binary_reader::SequentialReader<D> as std::io::Read>::read
   5: <ggcat_colors::managers::multiple::SequencesStorageStream as std::io::Read>::read
   6: <ggcat_colors::managers::multiple::MultipleColorsManager<H,MH> as ggcat_colors::colors_manager::ColorsMergeManager<H,MH>>::process_colors
   7: <ggcat_assembler_kmerge::final_executor::ParallelKmersMergeFinalExecutor<H,MH,CX> as ggcat_kmers_transform::KmersTransformFinalExecutor<ggcat_assembler_kmerge::ParallelKmersMergeFactory<H,MH,CX>>>::proces
   8: parallel_processor::execution_manager::thread_pool::ExecThreadPool::register_executors::{{closure}}
   9: tokio::runtime::task::core::Core<T,S>::poll
  10: tokio::runtime::task::harness::Harness<T,S>::poll
  11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run
  13: tokio::runtime::context::scoped::Scoped<T>::set
  14: tokio::runtime::context::runtime::enter_runtime
  15: tokio::runtime::scheduler::multi_thread::worker::run
  16: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  17: tokio::runtime::task::core::Core<T,S>::poll
  18: tokio::runtime::task::harness::Harness<T,S>::poll
  19: tokio::runtime::blocking::pool::Inner::run

Have you ever experienced something similar to this? Do you suggest I should create an issue in GGCAT's repository?

One extra question: what do you think about using Fulgor with a diverse set of genomes (e.g., 60,000 genomes coming from 20,000 bacterial species)?

Thanks!

rob-p commented

I've not experienced this when using fulgor before, but you may want to post this issue in the GGCAT repo (the tool is developed and maintained by a different group), as the issue seems to be happening during GGCAT cDBG construction, rather than fulgor indexing.

Sounds good, will do. Thanks.