/Y3P

Yelp Restaurant Rating Prediction

Primary LanguageJupyter Notebook

Y3RP- Yelp Restaurant Review Rating Prediction

We work on the Yelp Dataset to predict ratings and sentiments (positive and negative) for restaurant reviews. There are 1 million reviews written in free text making the problem challenging. Our project explores two main ideas, first, performance of CNNs for text classification. We show that a CNN model works at par with LSTMs traditionally used for text classication tasks specifically rating prediction in the context of this project. Secondly for the sentiment classification task we use only the aspect descriptors.This method works at par with using the entire review for CNNs.

Findings and Conclusion

In this project we proposed using adjectives as descriptors. We use CNN for text classification and compare the results with our LSTM and Naive Bayes baselines. We perform these tests for both rating prediction and sentiment classification task. Secondly we propose a method utilizing adjectives as descriptors for aspects. We propose this approach primarily for sentiment classifcation. Based on the results from the previous section following conclusions can be drawn -

  • Using entire review to predict star ratings gives better accuracy than using adjectives from the reviews. This can be attributed to the fact that the the language of the review as a whole is more deterministic for identifying the subtlety in the ratings.
  • CNN models are performing as good as LSTM model for rating prediction for multiclass using entire review and using adjectives for binary classification task. Thus CNN due to their ability for extracting n-gram features at different positions of a sentence through convolution filters are able to work well on this problem.
  • Using adjectives embeddings to learn the sentiment works at par in terms of accuracy using the entire review comment for CNNs. This is because the feature engineering process prior to training the model enables the model to learn on the right information. The model is also simpler in form as it doesn't need to learn long term dependencies.

Some additional observations that can be made are as follows -

  • Additional experiments (included in the above section) indicate that LSTMs outperform CNNs for binary classification task. LSTMs are learning the dependencies between the aspect and descriptors as well as learning the long term dependencies capturing the entire semantics of the sentence.
  • The experiments suggest that altough using adjectives works as good as using the entire reviews for sentiment classification, for LSTMs the accuracy improves by a significant amount. For both unbalances and balances datasets the improvement in accuracy is about 6% which on a test set size of 200,000 reviews essentially means that 12,000 more being correctly classified than before.
  • For problems with class imbalance if the model is not properly tuned deep learning models no matter how complex tend to predict the majority class.
  • From the distribution of ratings for restaurants it can be observed that people rated restaurants favorably. Manual review revealed that most the negative reviews were due to poor service, unforseen problems that are not likely to occur in routine.
  • The effects of data balancing are not very uniform. We thus believe that this technique is not very reliable for the scope project.

For this project we tried several methods and preprocessing steps to improve the accuracy on rating prediction task. We tried classical machine learning methods like Random Forest, Decision Tree Classifer. The input to these methods were Tf-idf weights of the reviews. However these methods performed poorly. We then explored deep learning methods namely CNNs and TCNs. TCN have a more robust architecture than CNNs and we expected it to work better than CNN however we got the opposite results. While using CNNs we tried different architectures, notably multi filter CNNs for text classification, CNNs with trainable embedding. For handling class imbalance we tried weighted loss functions, upsampling and downsampling data as well. Weighted loss function helped increase accuracy by minute amount.
Future work in this project will involve methods for better handling class imbalance. Another area of future work is based on our idea of using adjectives as descriptors. We can work on identification of aspect words. One more approach that could be tried is to build a sentiment classifier model from our dataset and use the sentiments of adjectives related to our aspects in non Deep Learning ML model like Random Forest.