assembling canu corrected reads
Closed this issue · 7 comments
Hi,
So I am trying to to assemble ONT reads that have been corrected with canu. But I am getting a strange error:
Laurens-MacBook-Pro:Ecoli_644 laurencowley$ ../git_repo/miniasm/miniasm -f source_canu_lowcov/Ecoli644.correctedReads.fasta.gz Ecoli644_correctedminimap.paf.gz > Ecoli644_miniasm.gfa
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::1.850_1.00] read 1687161 hits; stored 1257327 hits and 14342 sequences (124608869 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::2.042_1.00] 14340 query sequences remain after sub
[M::ma_hit_cut::2.068_1.00] 1257303 hits remain after cut
[M::ma_hit_flt::2.096_0.99] 1112831 hits remain after filtering; crude coverage after filtering: 54.73
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::2.153_0.99] 14335 query sequences remain after sub
[M::ma_hit_cut::2.179_0.99] 1112203 hits remain after cut
[M::ma_hit_chimeric::2.207_0.99] identified 143 chimeric reads
[M::ma_hit_contained::2.236_0.99] 749 sequences and 10538 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 10294 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 3498 arcs
[M::asg_arc_del_multi] removed 1652 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 8 tips
[M::asg_pop_bubble] popped 11 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 345 asymmetric arcs
[M::asg_arc_del_short] removed 1725 short overlaps
[M::asg_cut_tip] cut 8 tips
[M::asg_pop_bubble] popped 127 bubbles and trimmed 0 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 29 asymmetric arcs
[M::asg_arc_del_short] removed 47 short overlaps
[M::asg_cut_tip] cut 1 tips
[M::asg_pop_bubble] popped 15 bubbles and trimmed 0 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 9 asymmetric arcs
[M::asg_arc_del_short] removed 11 short overlaps
[M::asg_cut_tip] cut 1 tips
[M::asg_pop_bubble] popped 4 bubbles and trimmed 0 tips
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 1 asymmetric arcs
[M::asg_arc_del_short] removed 1 short overlaps
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 5: generating unitigs <===
Assertion failed: (sub[id].e - sub[id].s <= ks->seq.l), function ma_ug_seq, file asm.c, line 267.
Abort trap: 6
Do you know why this might be?
Could you check if the input contains duplicated read names?
I don't think so the reads are in fasta format with read names in the format:
143b9016-9b2f-4a9f-b3e7-a87378b5afa2_Basecall_2D_2d NBCOL1105_Ecoli_644_3618_1_ch32_file16_strand
Could you run the following and see if it gives any output?
zcat input.fa.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d
ok it has outputted the read names in order and there are no repeats
That command line would have no output if there were no duplicated read names. -d
means to output duplicated strings only.
oh I see!! right I will sort out the file and try miniasm again, sorry!
Never mind. I should have let minimap/miniasm check input name. This is actually a common error.