Flye cannot assemble a contig for high coverage amplicon data
mbdabrowska1 opened this issue · 3 comments
Hi, I'm running your tool for my amplicon ONT HIV data, but the pipeline fails at the Flye step:
[ ** STEP 3 ** ]get consensus and HIVDB report
/opt/bin/ClusterV/cv
NC_001802.1 1552 4810
CMD: mkdir -p /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus
checking subtype 1
CMD: flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 1000 -g 5k >/dev/null 2>&1
1
Traceback (most recent call last):
File "/opt/bin/ClusterV/cv.py", line 68, in <module>
main()
File "/opt/bin/ClusterV/cv.py", line 64, in main
submodule.main()
File "/opt/bin/ClusterV/cv/ClusterV.py", line 107, in main
run_get_consensus(args)
File "/opt/bin/ClusterV/cv/get_consensus.py", line 314, in run_get_consensus
_run_command(cmd)
File "/opt/bin/ClusterV/shared/utils.py", line 50, in _run_command
stderr = result.stderr
When I ran the Flye on its own to check what the log and the reason for failing I got the following error:
[2023-11-20 15:16:39] root: INFO: >>>STAGE: contigger
[2023-11-20 15:16:39] root: INFO: Generating contigs
[2023-11-20 15:16:39] root: DEBUG: -----Begin contigger analyser log------
[2023-11-20 15:16:39] root: DEBUG: Running: flye-modules contigger --graph-edges /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/repeat_graph_edges.fasta --reads /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/30-contigger --config /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --repeat-graph /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/repeat_graph_dump --graph-aln /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/read_alignment_dump --log /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/flye.log --threads 8 --min-ovlp 1000
[2023-11-20 15:16:39] DEBUG: Build date: Feb 22 2022 03:24:00
[2023-11-20 15:16:39] DEBUG: Total RAM: 1007 Gb
[2023-11-20 15:16:39] DEBUG: Available RAM: 900 Gb
[2023-11-20 15:16:39] DEBUG: Total CPUs: 64
[2023-11-20 15:16:39] DEBUG: Loading /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg
[2023-11-20 15:16:39] DEBUG: Loading /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg
[2023-11-20 15:16:39] DEBUG: big_genome_threshold=29000000
[2023-11-20 15:16:39] DEBUG: meta_read_filter_kmer_freq=100
[2023-11-20 15:16:39] DEBUG: chain_large_gap_penalty=2
[2023-11-20 15:16:39] DEBUG: chain_small_gap_penalty=0.5
[2023-11-20 15:16:39] DEBUG: chain_gap_jump_threshold=100
[2023-11-20 15:16:39] DEBUG: max_coverage_drop_rate=5
[2023-11-20 15:16:39] DEBUG: max_extensions_drop_rate=5
[2023-11-20 15:16:39] DEBUG: chimera_window=100
[2023-11-20 15:16:39] DEBUG: chimera_overhang=1000
[2023-11-20 15:16:39] DEBUG: min_reads_in_disjointig=4
[2023-11-20 15:16:39] DEBUG: max_inner_reads=10
[2023-11-20 15:16:39] DEBUG: max_inner_fraction=0.25
[2023-11-20 15:16:39] DEBUG: max_separation=500
[2023-11-20 15:16:39] DEBUG: unique_edge_length=50000
[2023-11-20 15:16:39] DEBUG: min_repeat_res_support=0.51
[2023-11-20 15:16:39] DEBUG: out_paths_ratio=5
[2023-11-20 15:16:39] DEBUG: graph_cov_drop_rate=5
[2023-11-20 15:16:39] DEBUG: coverage_estimate_window=100
[2023-11-20 15:16:39] DEBUG: max_bubble_length=50000
[2023-11-20 15:16:39] DEBUG: loop_coverage_rate=1.5
[2023-11-20 15:16:39] DEBUG: repeat_edge_cov_mult=1.75
[2023-11-20 15:16:39] DEBUG: weak_detach_rate=5
[2023-11-20 15:16:39] DEBUG: tip_coverage_rate=2
[2023-11-20 15:16:39] DEBUG: tip_length_rate=2
[2023-11-20 15:16:39] DEBUG: output_gfa_before_rr=0
[2023-11-20 15:16:39] DEBUG: low_cutoff_warning=1
[2023-11-20 15:16:39] DEBUG: kmer_size=17
[2023-11-20 15:16:39] DEBUG: use_minimizers=0
[2023-11-20 15:16:39] DEBUG: reads_base_alignment=0
[2023-11-20 15:16:39] DEBUG: meta_read_top_kmer_rate=0.40
[2023-11-20 15:16:39] DEBUG: maximum_jump=1500
[2023-11-20 15:16:39] DEBUG: maximum_overhang=1500
[2023-11-20 15:16:39] DEBUG: repeat_kmer_rate=100
[2023-11-20 15:16:39] DEBUG: assemble_ovlp_divergence=0.10
[2023-11-20 15:16:39] DEBUG: assemble_divergence_relative=1
[2023-11-20 15:16:39] DEBUG: repeat_graph_ovlp_divergence=0.08
[2023-11-20 15:16:39] DEBUG: read_align_ovlp_divergence=0.25
[2023-11-20 15:16:39] DEBUG: hpc_scoring_on=0
[2023-11-20 15:16:39] DEBUG: add_unassembled_reads=0
[2023-11-20 15:16:39] DEBUG: extend_contigs_with_repeats=0
[2023-11-20 15:16:39] DEBUG: min_read_cov_cutoff=3
[2023-11-20 15:16:39] DEBUG: short_tip_length=20000
[2023-11-20 15:16:39] DEBUG: long_tip_length=100000
[2023-11-20 15:16:39] DEBUG: Running with k-mer size: 17
[2023-11-20 15:16:39] DEBUG: Selected minimum overlap 1000
[2023-11-20 15:16:39] INFO: Reading sequences
[2023-11-20 15:16:39] DEBUG: Building positional index
[2023-11-20 15:16:39] DEBUG: Total sequence: 1821017 bp
[2023-11-20 15:16:39] DEBUG: Flipped 0
[2023-11-20 15:16:39] DEBUG: Final graph contains 0 egdes
[2023-11-20 15:16:39] DEBUG: Extending contigs into repeats
[2023-11-20 15:16:39] DEBUG: Covered 0 repetitive contigs
[2023-11-20 15:16:39] INFO: Generated 0 contigs
[2023-11-20 15:16:39] DEBUG: Writing FASTA
[2023-11-20 15:16:39] DEBUG: Generating scaffold connections
[2023-11-20 15:16:39] INFO: Added 0 scaffold connections
[2023-11-20 15:16:39] DEBUG: Writing Dot
[2023-11-20 15:16:39] DEBUG: Writing FASTA
[2023-11-20 15:16:39] DEBUG: Writing Gfa
[2023-11-20 15:16:39] DEBUG: Peak RAM usage: 0 Gb
-----------End assembly log------------
[2023-11-20 15:16:39] root: ERROR: No contigs were assembled - pipeline stopped
[2023-11-20 15:16:39] root: ERROR: Pipeline aborted
Attaching the full flye log:
flye.log
Any help would be greatly appreciated!
Hi,
By default, we have set the assembled HIV size to around 5k, which exceeds your current bed setting of 3k.
flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 1000 -g 5k >/dev/null 2>&1
To analyze data with a smaller amplicon size, can you please try to run the following to test whether Flye can run without error?
flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 500 -g 2.5k >/dev/null 2>&1
If this command executes without any issues, you can rerun your analysis by incorporating the following option into your pipeline:
python cv.py ClusterV ... \
--flye_genome_size 2.5k --flye_genome_size_olp 500
If not, could you please share your bam with me? I will check the problem in my local environment.
JH
Hi, I ran it as suggested but still the same issue. What email address would you like me to send the bam file to?
Update to v1.2 and solve the problem.