This is Kaggle Dataset and most of the coding was obtain from kaggle participant as I learned on how to use TF-IDF appropriately. I went through step by step to understand the findings. Later, after finishing the coding, I released I did not take notes because I was deeply focused on how the codes work.
The dataeet consist ~50,000 reviews of find food from amazon. About a 10 years period from Oct 1999 - Oct 2012.
Fidning TF-IDF related to good, ok and bad reviews
Reviews from Oct 1999 - Oct 2012 568,454 reviews 256,059 users 74,258 products 260 users with > 50 reviews
To determine whether a review is positive or negative and build a machine learning model areound it.
- Attributes:
- ID
- Product
- ProfileName
- HelpfulnessNumerator
- HelpfulnessDenominator
- Score
- Time
- Summary
- Text
- Methods:
- Finding the helpful rate and vote rate
- Investigate the score from highest to lowest
- Sorting negative and positive words from text and summary with their coefficients respectively.