/minimizers

Reference implementations of minimizer schemes to go with the mod-minimizers paper.

Primary LanguageRust

Minimizer reference implementations

This repository implements various types of sampling schemes and accompanies the mod-minimizer preprint:

  • Ragnar Groot Korekamp, Giulio Ermanno Pibiri. “The mod-minimizer: a simple and efficient sampling algorithm for long $k$-mers”. bioRxiv (2024). 10.1101/2024.05.25.595898

See also my corresponding blogpost: curiouscoding.nl/posts/minimizers.

Implemented sampling schemes:

  • Random minimizers.
  • two versions of asymptotically optimal Rotational minimizers (Marçais et al., 2018) .
  • Miniception, and a small slightly improved variant of it.
  • Decycling and double decycling based minimizers (Pellow et al., 2023).
  • Bidirectional anchors (Loukides et al., 2023)
  • Mod-sampling, with lr-minimizers and mod-minimizers (our work).

Density plot for $σ=4$. Reproduce using:

  • cargo run -r -- -n 5000000 -s 4 eval -o data/density_4.json
  • plot.ipynb

./fig/density_4.svg

Fast random minimizers

Additionally, the benches/blog directory contains a fast implementation of ntHash and a fast random minimizer implementation on top of that. This computes the minimizers of a human genome in under half a second, or 0.16ns/window.

See the corresponding blogpost for details: curiouscoding.nl/posts/fast-minimizers