/pyspark-adalsh

PySpark implementation of Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Spark Adaptive LSH

Top-K Entity Resolution for Apache Spark. The algorithm is described in the paper "Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing" of Vasilis Verroios and Hector Garcia-Molina of Stanford University, available here. Some of code of Adaptive LSH is based on pyspark-lsh project, an implementation of the classic LSH tecnique.