The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. We had access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets, video game. We used PySpark to perform the ETL process to extract the dataset, transformed the data, connected to an AWS RDS instance, and loaded the transformed data into pgAdmin. Next, we used Pandas to determine if there is any bias toward favorable reviews from Vine members in your dataset. We summarized of the analysis for Jennifer to submit to the SellBy stakeholders.
Data source: Amazon review dataset click for link vinereview dataset click for link Request access for colab press here
-Total Vine number is 94. -Total 5 stars vine number is 48. -Percentages of 5 stars reviews is ~51.6%
-Total no-Vine number is 40471. -Total 5 stars non paid vine number is 15663. -Percentages of 5 stars no-vine reviews is ~38.7%
-Total number of vines is 40565
We used Pandas to determine if there is any bias towards reviews that were written as part of the Vine program. For this analysis, we determined if having a paid Vine review makes a difference in the percentage of 5-star reviews.We can see that there is significant difference between vine and no vine reviews which are 51% and 39%, which shows that vine members are bias. We could have more statistical analysis like mean, med, mode to come up with better result.