/Mini-batch-k-Means-Clustering

Implement mini-batch k-means in PySpark distributed framework and test the performance of the algorithm on standard synthetic datasets

Primary LanguageJupyter Notebook

Web-Scale K-Means Clustering

Management and analysis of physical dataset project

Implement and benchmark alternatives of common clustering algorithms in Spark environment, without using the related already provided functions.

The project is thus focused on the efficient implementation of algorithms in a distributed system.

main topics:

Mini-batch k-Means, K-means ++, K-means ||