mikolmogorov/Flye

Meanings of both '--keep-halotypes' and '--no-alt-contigs'.

Jerry-is-a-mouse opened this issue · 7 comments

Dear fenderglass,
When I used Flye to assemble HG002 genome in the use of PacBio HiFi data, firstly, I kept '--keep-halotypes' only, and the result assembly fasta is 3.90GB; next, I kept both this two options, and the result assembly fasta is 3.88GB, only 0.02GB less than first result. The command line are as follows.
【nohup flye --pacbio-hifi /data/pb/HG002_PacBio.fastq --out-dir /flye_pb/ --keep-haplotypes --threads 64 --genome-size 3.1g &】
【nohup flye --pacbio-hifi /data/pb/HG002_PacBio.fastq --out-dir /flye_pb/ --keep-haplotypes --no-alt-contigs --threads 64 --genome-size 3.1g &】
I am puzzled that 'keep-halptype' option means "do not collapse alternative haplotypes", and I found that you said this option helps to keep alternate contigs, so I want to ask that does both primary and alternate contigs in the output assembly fasta?
And that, the '--no-alt-contigs' means "do not output contigs representing alternative haplotypes", does this means this option is used to exclude alternate contigs in the output assembly fasta?
Following two paragraphs were copied from your website: https://github.com/fenderglass/Flye/blob/flye/docs/USAGE.md. In my opinion, the '--no-alt-contigs' is only suit for default mode but not suit for haplotye mode, right?
And that, if I want to get only primary contigs or one pesudo-haplotype in assembly fasta, what options is need to set? As well, can Flye out put both primary and alternate contigs and in different fasta files (eg. primary.fasta; alternate.fasta)? If that can achieve, what options is need to set?
##########
Haplotype mode
By default, Flye (and metaFlye) collapses graph structures caused by alternative haplotypes (bubbles, superbubbles, roundabouts) to produce longer consensus contigs. The option --keep-haplotypes retains the alternative paths on the graph, producing less contigouos, but more detailed assembly.
Removing alternative contigs
In default mode, Flye is performing collapsed/haploid assmebly, but may output contigs representing alternative alleles if they differ significatnly from the "primary" assmebled allele. To disable output of alternative contigs, use the --no-alt-contigs option.
##########

If you only want primary contigs, you can set just the --no-alt-contigs option. Flye currently does not output alternative contigs into a separate file, unfortunately. If you want a diploid assembly, you can try Hapdup (https://github.com/KolmogorovLab/hapdup)

@fenderglass Thank you very much for your reply. I still have some confusions seeking for your help.
I had tried four combinations in Flye using PacBio HiFi reads. Other options kept the same. The genome sizes of HG002 are as follows.

  1. --keep-haplotype: 3.90GB
  2. --keep-haplotypes and --no-alt-contigs: 3.88 GB
  3. --no-alt-contigs: 3.83GB
  4. Neither: 3.44GB
    So I want to know that,
  5. --asm-coverage is need to be set?
  6. Due to "--keep-haplotypes" means “do not collapse alternative haplotypes”, so if using "no-alt-contigs" only, is there some haplotype information missed? What's the difference in output fasta between set --no-alt-contigs only and set both the two options?
  7. If Neither options set, is Flye able to output primary contigs and alternate contigs together in an assembly.fasta?
    Best wishes!

@fenderglass And '--keep-haplotype' option in haplotype mode is confict with 'no-alt-contigs' which is said to be used in defealt mode?

@Jerry-is-a-mouse no need to set --asm-coverage. How Flye marks alternative contigs should be described in the manual - please take a look and let me know if you have any questions.

Assuming this has been answered - feel free to follow up if you have more questions!

So sorry for my late reply. If I want to get primary contigs, should I set "--keep-haplotypes" option and "--no-alt-contigs" both; or just set the "--no-alt-contigs" option? You mentioned that “--no-alt-contigs” is used in default mode which may be different from haplotype mode using “--keep-haplotypes” option in my opinion.

Yes, just use --no-alt-contigs.