/Yelp-Business-Analysis

Yelp public their business data online. I am interested in the rationality of users' rating score based on users' reviews. Therefore, I decide to study this problem from Data Scientist perspective. I also make use of current rating system to build recommendation system for users.

Primary LanguageJupyter Notebook

Yelp-Business-Analysis

The reading sequence is:

  1. Yelp_Dataset-Data_Preprocessing
  2. Yelp_Dataset-EDA
  3. Yelp_Dataset-SetimentAnalysis
  4. Yelp_Dataset-Clustering
  5. Yelp_Dataset-Restaurant_Recommender

Summary of data analysis
Part one: Yelp_Dataset-Data_Preprocessing
■ Introduced data structure and content.
■ Filtered data based on city, time and category for futher analysis.
■ Saved processed data.

Part two: Yelp_Dataset-EDA
I explored three questions.
■ What are the top 50 resturants with most reviews in Las Vegas in 2017?
■ Does more reviews mean better the quality?
■ What is the popular restaurant style in Las Vegas in 2017?

Part three: Yelp_Dataset-SetimentAnalysis
■ Transferred unstructured review text data into feature vectors using NLP technologies like lemmatization and TF-IDF.
■ Performed sentiment analysis to predict users' rating score based on reviews with Random Forest.
■ Discovered that users can't give accurate explanation related to their rating scores.

Part four: Yelp_Dataset-Clustering
■ Identified the common users' review words within each group through clustering method K-Means.
■ Suggested using three classes rating method to replace current five-stars rating method.

Part five: Yelp_Dataset-Restaurant_Recommender
■ Constructed a restaurant recommender system using collaborative filtering and matrix factorization based on clients' past visits and ratings.