mikolmogorov/Flye

Flye running for over five days on a 3GB genome.

Opened this issue · 2 comments

Hello,

I am trying to run Flye on a 2.9 Gb fish genome of ~30X coverage. The Median overlap divergence step has been running for 6 days now. I am using the raw-ONT data as input using the command given below.

flye --out-dir . --threads 60 --nano-raw file_merged.fastq.gz --min-ovlp 9000

[2024-09-22 11:58:09] INFO: Starting Flye 2.9-b1768

[2024-09-22 11:58:09] INFO: >>>STAGE: configure
[2024-09-22 11:58:09] INFO: Configuring run
[2024-09-22 12:35:13] INFO: Total read length: 91408995728
[2024-09-22 12:35:13] INFO: Reads N50/N90: 15078 / 8750
[2024-09-22 12:35:13] INFO: Minimum overlap set to 9000
[2024-09-22 12:35:14] INFO: >>>STAGE: assembly
[2024-09-22 12:35:14] INFO: Assembling disjointigs
[2024-09-22 12:35:14] INFO: Reading sequences
[2024-09-22 13:08:00] INFO: Counting k-mers:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2024-09-22 15:06:12] INFO: Filling index table (1/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2024-09-22 17:47:54] INFO: Filling index table (2/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2024-09-22 19:36:03] INFO: Extending reads
[2024-09-23 01:02:15] INFO: Overlap-based coverage: 27
[2024-09-23 01:02:15] INFO: Median overlap divergence: 0.0910143
0% 10% 20% 30% 40% 50% 60%

I am not sure why this could be happening as it typically does not take this long for human genomes. Are there any parameters that I am required to modify to improve the speed ?

Thanks in advance.

marade commented

I am also seeing unusually long runs lately, for no apparent reason. I've seen it with multiple samples.

I am also seeing unusually long runs lately, for no apparent reason. I've seen it with multiple samples.

Hi Marade,

is there a specific step that is taking longer ? And are there any errors at the end or does it get completed successfully ?