/minimizer-iter

Iterate over minimizers of a DNA sequence

Primary LanguageRustMIT LicenseMIT

minimizer-iter

crates.io docs

Iterate over minimizers of a DNA sequence.

Features

  • iterates over minimizers in a single pass
  • yields bitpacked minimizers with their position
  • supports mod-minimizers, introduced by Groot Koerkamp & Pibiri
  • supports canonical minimizers
  • supports custom bit encoding of the nucleotides
  • supports custom hasher, using wyhash by default
  • can be seeded to produce a different ordering

If you'd like to use the underlying data structure manually, have a look at the minimizer-queue crate.

Example usage

use minimizer_iter::MinimizerBuilder;

// Build an iterator over minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64>::new()
    .minimizer_size(21)
    .width(11)
    .iter(b"TGATTGCACAATC");

for (minimizer, position) in min_iter {
    // ...
}

If you'd like to use mod-minimizers instead, just change new() to new_mod():

use minimizer_iter::MinimizerBuilder;

// Build an iterator over mod-minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64, _>::new_mod()
    .minimizer_size(21)
    .width(11)
    .iter(b"TGATTGCACAATC");

for (minimizer, position) in min_iter {
    // ...
}

Additionally, the iterator can produce canonical minimizers so that a sequence and its reverse complement will select the same minimizers. To do so, just add .canonical() to the builder:

MinimizerBuilder::<u64>::new()
    .canonical()
    .minimizer_size(...)
    .width(...)
    .iter(...)

If you need longer minimizers (> 32 bases), you can specify a bigger integer type such as u128:

MinimizerBuilder::<u128>::new()
    .minimizer_size(...)
    .width(...)
    .iter(...)

See the documentation for more details.

Benchmarks

To run benchmarks against other implementations of minimizers, clone this repository and run:

cargo bench

Contributors