/Restaurant-Review-Classifier

A simple NLP-Classifier that classifies the reviews as good or bad using the bag of words model

Primary LanguageJupyter Notebook

Restaurant Review Classifier

This is a simple NLP project based on the NLP section of A-Z Machine Learning Course on Udemy

The objective of this exercise is to identify the best model for classifying the review comments of a restaurant. We clean the dataset and make vectors out of them according to the bag of words model.

Index

Preprocessing

Dataset
Review Liked
0 Wow... Loved this place. 1
1 Crust is not good. 0
2 Not tasty and the texture was just nasty. 0
3 Stopped by during the late May bank holiday of... 1
4 The selection on the menu was great and so wer... 1

The dataset contains the review string followed by a binary flag indicating wheather the user liked it or not.

steps taken
  • Removal of punctuations and symbols
  • Removing the stop words
  • Tokenizing after stemming the different words.
  • Building the vectors from the induvidual reviews.

Summary of Gaussian Naive Bayes

The confusion matrix is :

\begin{bmatrix} 55 & 42 \ 12 & 91 \end{bmatrix}

$ accuracy = 0.73$

$precision = 0.567010309278$

$recall = 0.820895522388$

$F1score = 0.670731707317$

Summary of Decision Tree Classifier

The confusion matrix is :

\begin{bmatrix} 74 & 23 \ 35 & 68 \end{bmatrix}

$ accuracy = 0.71$

$precision = 0.762886597938$

$recall = 0.678899082569$

$F1score = 0.718446601942$

Summary of Random Forest Classifier

The confusion matrix is :

\begin{bmatrix} 87 & 10 \ 46 & 57 \end{bmatrix}

$ accuracy = 0.72$

$precision = 0.896907216495$

$recall = 0.654135338346$

$F1score = 0.75652173913$

Predictor

A sample predictor was created for implementing in our django app. The basic logic is to classify the comment with all the three models that we tried and then using the average of the result in order to predict the final result. This predictor takes the input in the form of a string.

Summary of Predictor function

The confusion matrix is :

\begin{bmatrix} 85 & 15 \ 32 & 71 \end{bmatrix}

$ accuracy = 0.765$

$precision = 0.845360824742$

$recall = 0.719298245614$

$F1score = 0.777251184834$

Pickling for use in our Django project

The three trained models were pickled using python's pickle library and then used inside the Django project.

Conclusion

In conclusion, we can say that none of these methods do a perfect job in classifying the reviews perfectly. However we can say that the best result was obtained for Random Forest Classifier. And even better result was obtained from the predictor function which aggregates the three classifiers. Another one factor we need to consider is that this model was built on only very limited dataset and has its limitations. Altogether we are able to get fairly good results for a basic implementatio on a web