mikolmogorov/Flye

assembly graph with several collapse contigs

pguenzi-tiberi opened this issue · 2 comments

Hello everyone,

First of all, thank you very much for what you have done and are currently doing for the community with this tool!

Over the last month, we have assembled the genomes of two different strains of the same species (a green alga) based on phylogenic markers. For each strain, I used Inspector (https://github.com/ChongLab/Inspector) to check the quality of each assembly. For one strain, it seems that everything is correct (Inspector didn't detect any errors and the graph is weird just for one big contig, you can see it just below this line).
image

For the other strain, the graph is very odd. It looks like a lot of sequences are shared between a lot of edges and I don't understand if this is biological or a flye error. Inspector has detected collapsed contigs. Do these weird things represent "collapsed"?
How do we solve this problem?
image

For the first assembly, I used Hifi pacbio reads. For the second, I used CLR Pacbio reads (i.e. no Hifi).
First command line:
flye --pacbio-hifi /bettik/guenzitp/data/HiFi.fastq.gz -i 1 -t 16 --out-dir ./flye_assembly_first
Second command line :
flye --pacbio-raw /bettik/guenzitp/data/subreads.fasta.gz -i 1 -t 16 --out-dir ./flye_assembly_second

Thank you very much !

Hi,

The collapsed edges likely represent unresolved repeats. The high-degree tangles in the assembly graph likely represent telomeres. The differences in graphs are likely because you are using HiFi mode for the first assembly, and CLR mode for the 2nd, they are pretty different in terms of assembly parameters. I don't think there is something wrong with the assembly.

Misha

Assuming this is resolved, feel free to follow up if you have more questions!