/DistClust_via_LSH_L2

This repository contains the experiments conducted in the paper: "Distributed Clustering via LSH Based Data Partitioning" (ICML 2018) with synthetic data. This implementation is not a distributed implementation. It is a single machine implementation intended to demonstrate the properties of this technique and approximation results.

Primary LanguagePython

Need:
	- Python 2.7
	- numpy
	- scikit-learn
	- matplotlib

Commands:

For properties of LSH based clustering, run: python n_hashes_vs_counts_experiments.py
For comparing approximation results with k-means++, run: python plsh_vs_kmeans_comp.py
For other experiments, run: python plsh_experiments.py