eead-csic-compbio/get_homologues

Phylogeny Question

TommyH-Tran opened this issue · 1 comments

Does anyone know whether it is better to create a phylogeny using the conservative core genes vs the soft core genes? I know using the conservative core genes is the standard. However, I feel like using the soft core (genes present in 95% of the genomes) might provide higher resolution? Due to the extra genes included. Especially when an outgroup (closely related species to help root the tree) is added.

The advantage of core genes, particularly single-copy core core genes, is that they allow multiple clusters to be concatenated seamlessly. I guess your intuition that soft-core genes might increase phylogenetic signal is justified, but that will force to either

  • fill the missing species with gaps ahead of concatenation
  • use a phylogeny-building method able to cope with variable number of taxa in the trees

A simple way to increase resolution when comparing closely related genomes is often use nucleotide sequences instead of peptide. Another option, discussed in https://vinuesa.github.io/get_phylomarkers , is to used the pangenome matrix itself to compute a phylogeny, anything else I might be missing @vinuesa ?