/reviews_polarity

Predicting polarity of Amazon user reviews using Deep Learning 🎭

Primary LanguageJupyter NotebookMIT LicenseMIT

Banner

Polarity prediction using Deep Learning 😡😊

Binder GitHub last commit GitHub

This project's aim is to predict the polarity using the users reviews in the spanish Amazon Review Corpus. The previous project with the EDA and ML modelling to the same data can be found here

-> Project status: [ Completed ]


Table of contents


Project description

(Back to top)

The importance of customer satisfaction is that it helps us to know the likelihood of a customer making a purchase in the future. Asking customers to rate the degree of satisfaction is a good way to see if they will become regular customers or even brand advocates.

In the present project we will try to improve the accuracy obtained in the previous project, in which some traditional machine learning techniques were tested. However, the literature points out that some neural network architectures work very well for natural language processing. Therefore the following hypothesis is raised:

Will a neural network wiht LSTM architecture improve the accuracy against a Linear SVC model with the current data?

Methods used

  • Corpus preprocessing
  • Feature engineering
  • Sentiment Analysis
  • Machine learning
  • Deep Learning

Technologies

  • Python
  • Numpy, Pandas, Scipy
  • NLTK
  • Matplotlib, Seaborn
  • Scikit Learn
  • Keras, Tensorflow

Results

(Back to top)

The LSTM neural network produced an improvement of 0.002 in the training set and almost 0 in the validation set. The neural network slightly worsens the accuracy of negative reviews, while at the same time it slightly improves on positive ones. One could say that it predicts the classes slightly better in general, but the difference is so low that it could easily be changed by adjusting hyperparameters in both models.

In short, the hypothesis raised at the beginning of the project is rejected. Using a neural network LSTM does not improve the accuracy against an SVM algorithm with linear function.

Results

It should be noted that in both models the overfit to the data was very well controlled since the accuracy in train and test does not differ much. And that is why both models are viable options to solve the problem, but the author's opinion is that the Linear SVC model has some superiority for its simplicity and interpretability.


Next steps

(Back to top)

The Neural Network still has room for improvement: it is possible to test pre-trained models, make a more exhaustive exploration of hyperparameters or use corpus-trained embeddings in Spanish.

Another option to improve the current project would be include the data about the category of the product or its price.


Contact

(Back to top)

You can visit my Personal Website, follow me on Twitter, connect with me on LinkedIn, or check out the rest of my projects on my GitHub.



Footer