GGCAT panicks because assertion `left == right` failed
Closed this issue · 2 comments
Hello,
I was able to run fulgor
using a small toy dataset (~100 genomes). But I am unable to run it for a dataset consisting of (>10,000 genomes from various bacterial species). GGCAT panics because of a failed assertion statement. I also tried to run it with a smaller subsample, if the number of genomes is sufficiently low (~1000) it runs without an error. Here is an example with 4,000 genomes:
Command:
build -l reference_list.txt -o index-k29m20g8 -k 29 -m 20 -d tmp_dir -g 64 -t 32 --verbose --meta
2024-08-27 14:36:37: step 1. build colored compacted dBG
about to process 4000 files...
Allocator initialized: mem: 64 GiB chunks: 262144 log2: 18
Started phase: reads bucketing prev stats:
Elaborated 201407 sequences! [9984012071 | 99.81% qb] (2005[2006]/4000 => 50.12%) ptime: 16.64s gtime: 16.65s
Temp buckets files size: 2.01 MiB
Finished phase: reads bucketing. phase duration: 23.87s gtime: 23.89s
Started phase: kmers merge prev stats:
Processing bucket 46 of [1024[R:8050]] ptime: 10.40s gtime: 34.28s phase eta: 262s est. tot: 273s
Processing bucket 72 of [1024[R:12400]] ptime: 20.60s gtime: 44.48s phase eta: 314s est. tot: 335s
thread 'km_comp' panicked at /path/to/fulgor/external/ggcat/libs-crates/parallel-processor-rs/src/buckets/readers/generic_binary_reader.rs:73:17:
assertion `left == right` failed
left: 1877980
right: 1877976
stack backtrace:
0: rust_begin_unwind
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs:652:5
1: core::panicking::panic_fmt
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:72:14
2: core::panicking::assert_failed_inner
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:408:17
3: core::panicking::assert_failed
4: <parallel_processor::buckets::readers::generic_binary_reader::SequentialReader<D> as std::io::Read>::read
5: <ggcat_colors::managers::multiple::SequencesStorageStream as std::io::Read>::read
6: <ggcat_colors::managers::multiple::MultipleColorsManager<H,MH> as ggcat_colors::colors_manager::ColorsMergeManager<H,MH>>::process_colors
7: <ggcat_assembler_kmerge::final_executor::ParallelKmersMergeFinalExecutor<H,MH,CX> as ggcat_kmers_transform::KmersTransformFinalExecutor<ggcat_assembler_kmerge::ParallelKmersMergeFactory<H,MH,CX>>>::proces
8: parallel_processor::execution_manager::thread_pool::ExecThreadPool::register_executors::{{closure}}
9: tokio::runtime::task::core::Core<T,S>::poll
10: tokio::runtime::task::harness::Harness<T,S>::poll
11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
12: tokio::runtime::scheduler::multi_thread::worker::Context::run
13: tokio::runtime::context::scoped::Scoped<T>::set
14: tokio::runtime::context::runtime::enter_runtime
15: tokio::runtime::scheduler::multi_thread::worker::run
16: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
17: tokio::runtime::task::core::Core<T,S>::poll
18: tokio::runtime::task::harness::Harness<T,S>::poll
19: tokio::runtime::blocking::pool::Inner::run
Have you ever experienced something similar to this? Do you suggest I should create an issue in GGCAT's repository?
One extra question: what do you think about using Fulgor with a diverse set of genomes (e.g., 60,000 genomes coming from 20,000 bacterial species)?
Thanks!
I've not experienced this when using fulgor
before, but you may want to post this issue in the GGCAT
repo (the tool is developed and maintained by a different group), as the issue seems to be happening during GGCAT
cDBG construction, rather than fulgor
indexing.
Sounds good, will do. Thanks.