Change the algorithm for positional k-mer space

Question

Change the algorithm for positional k-mer space

slowikj opened this issue 4 years ago · 0 comments

An integer representing a position of a k-mer in a sequence tends to be larger than relatively small P constants used in hash function formula. Therefore, it is not recommended to use it during the hashing of a sequence.

How to fix it?
There are two solutions:

Use a large P
Different positional k-mer hashing approach - use (d + 1)-dimensional representations of k-mer (one extra integer indicates not transformed k-mer position)
Change the dictionary structure and use a specialized version for positional variant; for example, 2-level approach: the first level is the position, the second level -- the hash