Text clustering & Topic Modeling
From Yelp Data Competition. https://www.yelp.com/dataset/challenge
This study mainly discussed what makes a place good to hang out at night in America. Thus, I chose the Yelp data and found out some interesting insights. Yelp reviews data is gathered from Yelp official website. The reviews files are extracted from the dataset. There are totally 2,000,447 reviews and we get the review text and the corresponding review stars given by every review.
- Latent Dirichlet Allocation
- LDA Model with tf-id and bow