Clustering-of-Yelp-Reviews

TASK 1

Implementation of the K-Means algorithm with 1) Word count and 2) TF-IDF features from yelp reviews using Euclidean distance measure. The algorithm takes an input file with reviews, feature, number of clusters, and maximum number of iterations. For each cluster, the top ten frequently occurring features are identified.

TASK 2

Implementation of Clustering Algorithms - K-Means and Bisecting K-Means using Spark with the Yelp Reviews Dataset.