/Amazon-Fine-Food-Reviews-

Using KNN and Bag of Words to predict the polarity of Reviews

Primary LanguageJupyter Notebook

Amazon-Fine-Food-Reviews-

Using KNN and Bag of Words to predict the polarity of Reviews

  • The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.
  • Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10

Attribute Information:

  1. Id
  2. ProductId - unique identifier for the product
  3. UserId - unqiue identifier for the user
  4. ProfileName
  5. HelpfulnessNumerator - number of users who found the review helpful
  6. HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
  7. Score - rating between 1 and 5
  8. Time - timestamp for the review
  9. Summary - brief summary of the review
  10. Text - text of the review

Objective:

Given a review, determine whether the review is positive (rating of 4 or 5) or negative (rating of 1 or 2).

Note: I have taken only 5000 reviews for this purpose because of computational constraints.

Checkout the notebook for full end to end implementation with proper expplanation.