The purpose of this project is to analyze the written reviews of the paid Amazon Vine Program. This program is a service that allows manufacturers and publishers to receive reviews for their products. Companies pay a small fee to Amazon and provide products to Amazon Vine members who are required to publish reviews. This particular project focused on a dataset containing reviews of video game products. PySpark was used to perform the ETL process, including peforming a connection to an Amazon Web Service RDS database to load the data into pgAdmin. Then PySpark was used to determine if there was any bias toward favorable reviews from Vine and non Vine members in the videogame dataset. The results are shown below.
Below are the results and the code that produced them.
- There were 94 total reviews from Vine members
- There were 40,471 total reviews from non Vine members
- Of the 94 Vine member reviews, there were 48 - 5 star reviews
- Of the 40,471 non Vine member reviews, there were 15,663 - 5 star reviews
- 51.06% of Vine members wrote 5 star reviews
- 38.70% of non Vine members wrote 5 star reviews
The analysis shows that there is a positivity bias in Amazon's Vine Program of 12.36%. Another analysis that could be done would involve looking at other datasets. Since they are all formatted the same, it would not be difficult to join multiple datasets together and perform the same analysis. This would show if this bias is limited to just the video game dataset or if it is consistent across all products. Because there are only 94 Vine members in the video game dataset compared to 40,471 non members, it would be beneficial to include datasets with more Vine members.