minhash
There are 115 repositories under minhash topic.
ekzhu/datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
sourmash-bio/sourmash
Quickly search, compare, and analyze genomic and metagenomic data sets.
Callidon/bloom-filters
JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
mattilyra/LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
dnbaker/sketch
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
bigmlcom/sketchy
Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)
YaleDHLab/intertext
Detect and visualize text reuse
src-d/minhashcuda
Weighted MinHash implementation on CUDA (multi-gpu).
dynatrace-oss/hash4j
Dynatrace hash library for Java
serega/gaoya
Locality Sensitive Hashing
beowolx/rensa
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
will-rowe/groot
A resistome profiler for Graphing Resistance Out Of meTagenomes
andrewmcloud/consimilo
A Clojure library for querying large data-sets on similarity
codelibs/elasticsearch-minhash
Elasticsearch plugin for b-bit minhash algorism
LiveRamp/HyperMinHash-java
Union, intersection, and set cardinality in loglog space
duhaime/minhash
Quickly estimate the similarity between many sets
dynatrace-research/set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
edawson/rkmh
Classify sequencing reads using MinHash.
esteinig/sketchy
Genomic neighbor typing of bacterial pathogens using MinHash :rat:
oertl/probminhash
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
dselivanov/LSHR
Locality Sensitive Hashing In R
travisbrady/flajolet
Probabilistic data structures for OCaml
codelibs/minhash
This provides tools for b-bit MinHash algorism.
gurushida/mnemophonix
A simple audio fingerprinting system
gibranfp/Sampled-MinHashing
A method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery
oertl/bagminhash
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
davidsvy/Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
ekzhu/minhash-lsh
Minhash LSH in Golang
Cheng-Lin-Li/Spark
There are Python 2.7 codes and learning notes for Spark 2.1.1
edawson/mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
BlaCkinkGJ/catch-me-if-you-can
plagiarism detector
lgautier/mashing-pumpkins
Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.
EdDuarte/similarity-search-java
Easy-to-use Java similarity algorithms for text and numeric-series
steven-s/text-shingles
k-shingling for text to help compare similarity
markusorsi/mapchiral
Chiral version of the MinHashed Atom-Pair Fingerprint
sourmash-bio/wort
A database for signatures of public genomic sources