/online-k-clustering

This project implements online-k-clustering algorithm as mentioned in this paper(http://cseweb.ucsd.edu/~dasgupta/291/lec6.pdf). It produces REALTIME k-clustering on an infinite stream of data. It is implemented on top of twitter storm and uses cassandra as database. It deals with 2-dimensional matrices and clusters in Euclidean space.

Primary LanguageJava

This project implements online-k-clustering algorithm as mentioned in this paper(http://cseweb.ucsd.edu/~dasgupta/291/lec6.pdf). It produces a REALTIME, DISTRIBUTED k-clustering on an infinite stream of data(Yes! you heard it right, it's realtime :-)). It is implemented on top of twitter storm and uses cassandra as distributed database. It deals with 2-dimensional matrices and clusters in Euclidean space.
Note: You can read more about twitter storm here(https://github.com/nathanmarz/storm/). This projects implements the algorithm in the local mode and not on actual cluster, but the same implementation can be ported to an actual cluster with very little changes.