Protein seqs parsing
Closed this issue · 0 comments
mr-eyes commented
kmerDecoder should be able to parse protein sequences with a maximum of k12
We have 20 amino acids, and each amino acid can be represented in 5 bits. So we can have a maximum kmer size of 12aa to be stored in the uint64_t
.
If we used "Dayhoff" encoding, we would extend the kmer size to 64/3=21aa
.
I do not know what the drawbacks of using Dayhoff's encoding are.