Pangenome question
Closed this issue · 1 comments
Hello,
I have used get_homologues and anvi'o to predict the pangenome of four different species. Anvi'o pangenome sizes were consistently smaller and I was wondering why this could be? How does get_homologues determine gene clusters? is it maybe less stringent? I used the methods from "4.9.1 Obtaining a pangenome matrix" with OMCL and COGS to obtain the get_homologues pangenomes.
GET_HOMOLGOUES pangenome: Cpr 3777, Cps 4978, Cac 4488, Ctu 3232
Anvio pan genome: Cpr 3108, Cps 3590, Cac 3427, Ctu 2907
percent difference:
19.4 Cpr
32.4 Cps
26.8 Cac
10.6 Ctu
Hi @TommyH-Tran , if you are using default params that would means
-C min %coverage in BLAST pairwise alignments (range [1-100],default=75)
-S min %sequence identity in BLAST query/subj pairs (range [1-100],default=1 [BDBH|OMCL])
Those have worked well in our experience in general groups of bacteria and the fact that you are using the OMCL-COGS intersection should give more confidence in your set of clusters. Some ideas:
- Check a few clusters private to GET_HOMOLOGUES
- Use -D to ensure all sequences in a cluster share the same Pfam domains, this will increase stringency
- In the original paper (https://journals.asm.org/doi/full/10.1128/aem.02411-13) we already reported that GET_HOMOLOGUES was able to capture many orthogroups missed by OMA
Hope this helps,
Bruno