Amazon Food Reviews Analysis

In this project we will perform Exploratory data analysis, data preprocessing, Feature engineering, model building on Amazon food reviews. Used various model like KNN, Naive Bayes, Logistic Regressin, Decision Tree, All types of Clustering, XGBoost etc with many featurization tecnique like bow, tfidf, word2vec, average word2vec, tfidf word2vec etc and also performed hyperparameter tuning for each and every model and plotted various plots for checking model stability, convergence of hyperparameter value, underfitting, overfitting etc.

Data Source and Information

This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

Attribute Information

  • ProductId - unique identifier for the product
  • UserId - unqiue identifier for the user
  • ProfileName
  • HelpfulnessNumerator - number of users who found the review helpful
  • HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
  • Score - rating between 1 and 5
  • Time - timestamp for the review
  • Summary - brief summary of the review
  • Text - text of the review

Objective

To classify the sentiment of a given review.

Algorithms

t-distributed stochastic neighbor embedding (t-sne)

K-Nearest Neighbour(KNN)

Naive Bayes

Logistic Regression

Support Vector Machine

Decision Trees

XGBoost,RandomForest

K-means, Agglomerative, DBSCAN Clustering

Truncated SVD

Long Short-Term Memory (LSTM) networks