Yelp Review Rating Prediction

Summary

The goal of this project was to predict reviews' star ratings on Yelp using the review text. We built the following models that perform text analysis on review data to predict the rating stars.

Baseline Model: The most common rating, 3 stars, is the rating predicted by this model for all the reviews.
Term Frequency Model: In this model we use frequency of word occurrence to predict the review rating.
LDA + Sentiment Model: This model predicts rating using Latent Dirichlet Allocation (LDA) with an added sentiment layer by extracting topics and sentiment associated with the review from review text.
NMF + Sentiment Model: In this model, we predict review rating using Non-negative Matrix Factorization (NMF) with an added sentiment layer by extracting topics and sentiment associated with the review from review text.

We achieved an accuracy of 61% in predicting review rating stars.

Code

Most files are IPython notebooks (.ipynb extension with JSON data).

The following modules are used in at least one of the examples:

Python 2.7
NumPy
Pandas
Scipy
scikit-learn
nltk
seaborn
Matplotlib
Gensim
IPython 0.13+
cPickle

You can view the notebooks in the IPython notebook viewer (see links below).

Data Exploration
All Models except NMF
- This includes following models models
  - Baseline Model
  - Naive-Bayes Model
  - LDA Model
  - LDA + Sentiment Model
NMF Model

Team Members:

Chetan Naik
Rakesh Chada

chetannaik/predict-review-rating

Yelp Review Rating Prediction

Summary

Code

Team Members: