gtonkinhill/panaroo

Genes missing in the pangenome reference fasta

LucoDevro opened this issue · 2 comments

Hi, me again,

There still appear to be issues with the pangenome reference generator. Lately, I wanted to build a core gene phylogeny using PhyloPhlAn with the core genes identified by Panaroo, but when I tried to get their reference sequences out of the pangenome reference fasta, part of the genes were missing.

For example, for my case, Panaroo identified 186.738 genes in total, but the pangenome reference fasta only contains 179.435. Regarding core genes, only 88 out of the 103 core genes are present (I'm doing a genus-level analysis so that's why these numbers may be a bit odd).

Does this sound like an issue to you, or am I not aware of some intermediate filtering or gene merging step? I did not flag to merge paralogs.

Hi,

This is likely due to paralogs. By default, Panaroo will only report one representative of each group of paralogous clusters in the pangenome reference fasta.

All right. Thanks for clearing that out.