minhash-lsh-algorithm
There are 43 repositories under minhash-lsh-algorithm topic.
stanford-futuredata/FAST
End-to-end earthquake detection pipeline via efficient time series similarity search
andrewmcloud/consimilo
A Clojure library for querying large data-sets on similarity
dynatrace-research/set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
gurushida/mnemophonix
A simple audio fingerprinting system
Cheng-Lin-Li/Spark
There are Python 2.7 codes and learning notes for Spark 2.1.1
emarkou/Text-Similarity
A text similarity computation using minhashing and Jaccard distance on reuters dataset
tmpsrcrepo/benchmark_minhash_lsh
insight data engineering fellow project
wherefortravel/minhash-node-rs
MinHash and LSH index written in Rust for Node.js
mehmetaydar/LinstaMatch-Csharp
An improved method of locality-sensitive hashing for scalable instance matching. In this study, we propose a scalable approach for automatically identifying similar candidate instance pairs in very large datasets utilizing minhash-lsh-algorithm in C#.
steven-s/minhash-document-clusters
Minhash clustering of text documents
adriacabeza/Document-similarity-detection-using-hashing
:page_with_curl:Document similarity detection using hashing
emmajy-li/cmsc643_similar_sets
Project 1: Similar document searching via MinHash and Locality Sensitive Hashing
kazemnejad/text_similarity_search
An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.
micts/jss
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
rkapsalis/Range-and-similarity-queries
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
mandychumt/YelpRecommendationSystem
Recommendation systems for Yelp (collaborative filtering & content-based)
shubhamwaghe/Scalable-Data-Mining
Scalable Data Mining - Assignment submissions
vbarzokas/apache-spark-link-prediction
A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala
92amartins/minhash-example
MinHash Example
AdrianaMacc/Covid-19-BigData-Project
SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.
amitkp57/dbms-correlated-columns-detection
Detecting correlated columns in DBMS systems using techniques like Pearson Correlation, LSH Minhashing and Random Sampling.
LM1997610/Data-Mining
Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome
pramodh941/AMD
Finding Similar Pairs using PySpark
rihenperry/csuci-mscs-thesis-dist-web-crawler
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
SpydazWebAI-NLP/SpydazWebAI_NLP_Models
Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling
aloobun/minhash_exp
Deduplication : minhash w/ LSH
MaviVestini/ADM-LT_HW1
First homework for the Advance Data Mining course
ranieri-unimi/echo
LSH from zero 🦾 native Map-Reduce in PySpark 🚀
xadityax/Locality-Sensitive-Hashing-DNA-Seqs
Implementing Locality Sensitive Hashing for DNA Sequences.
Engrima18/TextData_Mining_ML
Textual data manipulation projects with applications of advanced data mining techniques: recommendation systems, information retrieval systems, search engines, latent sentiment analysis, pagerank, PCA.
FilipePires98/SpellChecker
SpellChecker: an application to check for spell errors.
LM1997610/ADM_HW4
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
MaviVestini/ADM_HW4
4th homework for ADM