Word-Distribution-Over-Yelp-Review

Performed text mining for Yelp reviews using Python and PySpark. Used PySpark to extract, clean, and merge data from Apache Spark, which results in 5.2 million rows and 44 columns. Then identified 5 objectives: Top words for 5 locations, rating, top 20 users, reviews rated useful, and most fans. As a result, Tableau were used to showcase our findings.