Implementation of different variations of K-Means algorithm on Apache Flink.
k-means [1] is one of the most widely used clustering algorithms. In this project we implement k-means and some of its variations and extensions:
-
Original k-means algorithm
-
Mini Batch k-means [2]
-
k-means++ [3,4]
-
Bisecting k-means [5]
#References [1] https://en.wikipedia.org/wiki/K-means_clustering
[2] Sculley, David. "Web-scale k-means clustering." Proceedings of the 19th international conference on World wide web. ACM, 2010.
[3] Bahmani, Bahman, et al. "Scalable k-means++." Proceedings of the VLDB Endowment 5.7 (2012): 622-633.
[4] Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007.
[5] Steinbach, Michael, George Karypis, and Vipin Kumar. "A comparison of document clustering techniques." KDD workshop on text mining. Vol. 400. No. 1. 2000.