/mapreduce_minhash_lsh

A simple implementation of minHash LSH in hadoop mapreduce

Primary LanguageJava

minHash LSH

A simple implementation of the minHash algorithm in mapreduce. The output is a csv with the candidates that superates the jaccard distance threshold of 0.8.