dib-lab/kmerDecoder

Protein seqs parsing

Closed this issue · 0 comments

kmerDecoder should be able to parse protein sequences with a maximum of k12

We have 20 amino acids, and each amino acid can be represented in 5 bits. So we can have a maximum kmer size of 12aa to be stored in the uint64_t.

If we used "Dayhoff" encoding, we would extend the kmer size to 64/3=21aa.
I do not know what the drawbacks of using Dayhoff's encoding are.