kMeans on Apache Flink

Implementation of different variations of K-Means algorithm on Apache Flink.

k-means [1] is one of the most widely used clustering algorithms. In this project we implement k-means and some of its variations and extensions:

Original k-means algorithm
Mini Batch k-means [2]
k-means++ [3,4]
Bisecting k-means [5]

#References [1] https://en.wikipedia.org/wiki/K-means_clustering

[2] Sculley, David. "Web-scale k-means clustering." Proceedings of the 19th international conference on World wide web. ACM, 2010.

[3] Bahmani, Bahman, et al. "Scalable k-means++." Proceedings of the VLDB Endowment 5.7 (2012): 622-633.

[4] Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007.

[5] Steinbach, Michael, George Karypis, and Vipin Kumar. "A comparison of document clustering techniques." KDD workshop on text mining. Vol. 400. No. 1. 2000.

mmziyad/flink-kmeans

kMeans on Apache Flink