Change the algorithm for positional k-mer space
slowikj opened this issue · 0 comments
slowikj commented
An integer representing a position of a k-mer
in a sequence tends to be larger than relatively small P
constants used in hash function formula
. Therefore, it is not recommended to use it during the hashing of a sequence.
How to fix it?
There are two solutions:
- Use a large
P
- Different
positional k-mer hashing
approach - use (d + 1)-dimensional representations ofk-mer
(one extra integer indicates not transformed k-mer position) - Change the dictionary structure and use a specialized version for positional variant; for example, 2-level approach: the first level is the position, the second level -- the hash