Amazon Food Reviews


This is Kaggle Dataset and most of the coding was obtain from kaggle participant as I learned on how to use TF-IDF appropriately. I went through step by step to understand the findings. Later, after finishing the coding, I released I did not take notes because I was deeply focused on how the codes work.


The dataeet consist ~50,000 reviews of find food from amazon. About a 10 years period from Oct 1999 - Oct 2012.


Fidning TF-IDF related to good, ok and bad reviews

Data include:

Reviews from Oct 1999 - Oct 2012 568,454 reviews 256,059 users 74,258 products 260 users with > 50 reviews


To determine whether a review is positive or negative and build a machine learning model areound it.

Application used:

Python (jupyter)

Object based considerations

  • Attributes:
    1. ID
    2. Product
    3. ProfileName
    4. HelpfulnessNumerator
    5. HelpfulnessDenominator
    6. Score
    7. Time
    8. Summary
    9. Text
  • Methods:
    1. Finding the helpful rate and vote rate
    2. Investigate the score from highest to lowest
    3. Sorting negative and positive words from text and summary with their coefficients respectively.