sanger-pathogens/snp-sites

Improvement: let user specify pure and ambiguous bases

arturotorreso opened this issue · 0 comments

This is an amazing tool, and I ended up relying quite a lot on it due to its speed!

One improvement I would add is letting the user specify what a "pure base" is and what an "unknown" base is. This feature is inspired by two situations I run into often:

  1. Many times "-" actually symbolizes a proper polymorphism, and for non-phylogenetic analysis users may want to keep them in their snp-aligment.
  2. I often use IUPAC ambiguity codes in my alignments (M,R,W...), and in those positions with REF+IUPAC code, the column will be kept.

I think the change would be relatively easy to implement. I did change the src code (objects "is_unknown" and "is_pure" from alignment-file.c) before compiling it so it's suitable to my needs, but other users may want to benefit from this as well.