eead-csic-compbio/get_homologues

ANI coding sequences + noncoding?

Closed this issue · 1 comments

From my understanding our ANI calculations are only for CDS, whereas others include all sequences (noncoding and coding). There are sometimes minor discrepancies in results. Is there any way get_homologues could generate ANIs using both the noncoding and coding sequences?

By default get_homologues.pl -A will report ANI computed from BLASTP protein alignments. As proteins are more conserved than nucleotide sequences, this should give you high identities.
Instead, get_homologues.pl -A -c 'CDS' will report ANI computed from BLASTN nucleotide CDS alignments. You can change that to -a 'tRNA,rRNA' if you wish but you won't be able to use it to intergenic regions unless they correspond to features in a GenBank file.