mikolmogorov/Flye

Assembly of complex genomes with high rate of genome rearrangements

Closed this issue · 2 comments

Dear Dr Kolmogorov,
I work with very strange genome of sea cucumber and could not get a mosaic haploid genome assembly. I tried some options (meta, keep-haplotypes, no-alt-contigs, tweaks from ONT recommendation for v14 chemistry), but at each variant I got excellent haplotype resolved assembly). And I can't divide obtained contigs to haplotypes of tested tools (purge_dups, kmer_dedup, purge_haplotypes). As result I only lost about 30% of genes, based on BUSCO report.
Can you advice me settings, which I should check?
There are some info about species:
Diploid species;
Genome size about 1.4-1.6Gb (assembled 2.8-3.2Gb);
v14 ONT chemistry;
About 40x coverage (based on 1.4Gb genome size);
Reads N50 14100bp.

Hello,

Sorry for my late response! Looks like the opposite problem from what most users want.. If purge_haplotigs did not collapse the assembly, the haplotypes should be quite different - and at this point purge haplotigs thinks that this is one big haploid assembly.

You can try to experiment with increasing --read-error to higher values (e.g. 5-10%) or trying --nano-raw instead of --nano-hq. But this may not help. The right way to approach this is probably to self-align the assembly, identify pairs of homologous contigs and separate them into two bins. But this may not be trivial either since it may not be a 1:1 relationship.

Assuming that this has been answered. Feel free to follow up if you have more questions!