This repository implements various types of sampling schemes and accompanies the mod-minimizer preprint:
-
Ragnar Groot Korekamp, Giulio Ermanno Pibiri. “The mod-minimizer: a simple
and efficient sampling algorithm for long
$k$ -mers”. bioRxiv (2024). 10.1101/2024.05.25.595898
See also my corresponding blogpost: curiouscoding.nl/posts/minimizers.
Implemented sampling schemes:
- Random minimizers.
- two versions of asymptotically optimal Rotational minimizers (Marçais et al., 2018) .
- Miniception, and a small slightly improved variant of it.
- Decycling and double decycling based minimizers (Pellow et al., 2023).
- Bidirectional anchors (Loukides et al., 2023)
- Mod-sampling, with lr-minimizers and mod-minimizers (our work).
Density plot for
cargo run -r -- -n 5000000 -s 4 eval -o data/density_4.json
plot.ipynb
Additionally, the benches/blog
directory contains a fast implementation of
ntHash and a fast random minimizer implementation on top of that. This computes
the minimizers of a human genome in under half a second, or 0.16ns/window
.
See the corresponding blogpost for details: curiouscoding.nl/posts/fast-minimizers