This is a Go implementation of the ntHash recursive hash function for hashing all possible k-mers in a DNA/RNA sequence.
For more information, read the ntHash paper by Mohamadi et al. or check out their C++ implementation.
This implementation was inspired by Luiz Irber and his recent blog post on his cool Rust ntHash implementation.
I have coded this up in Go so that ntHash can be used in my HULK and GROOT projects but feel free to use it for yourselves.
go get github.com/will-rowe/nthash
package main
import (
"log"
"github.com/will-rowe/nthash"
)
var (
sequence = []byte("ACGTCGTCAGTCGATGCAGTACGTCGTCAGTCGATGCAGT")
kmerSize = 11
)
func main() {
// create the ntHash iterator using a pointer to the sequence and a k-mer size
hasher, err := ntHash.New(&sequence, kmerSize)
// check for errors (e.g. bad k-mer size choice)
if err != nil {
log.Fatal(err)
}
// collect the hashes by ranging over the hash channel produced by the Hash method
canonical := true
for hash := range hasher.Hash(canonical) {
log.Println(hash)
}
}