/AmazonFeaturedOffersDataAnalysis

Predicting which offers on Amazon will be selected as featured offers given a body of historical data. Explores the problem through data analysis, cleaning and predictive modelling.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Project Title: Predicting Featured Offers on Amazon

Project Description:

The problem and data come from the Amazon online shopping platform. Several sellers can sell the same product on Amazon. Based on the data provided by the seller to Amazon (seller reputation, product price, shipping details, etc) Amazon ranks seller offers from best to worst for a given product. This ranking is mostly influenced by the product price offer of the seller, but it can also be influenced by other features. We first need to understand which features are most indicative of a seller being ranked first by Amazon for a product. When the seller is ranked first for a product we say that the seller is the 'winner' among all the offers, because their offer is shown first when a user searches for a product on Amazon, which increases their chances of selling the product. Our goal is to work with the data to build and evaluate prediction models that capture the relationship between descriptive features and the target feature 'IsWinner'.

The project was conducted as part of a module at UCD on Data Analytics. It explores many aspects of the Data Analytics process, from data understanding and cleaning to predictive modelling. The models explored include linear regression, logistic regression and random forest. While not all of these models would generally be considered appropriate for this problem set, the purpose of the assignment was to analyse this and learn why this might or might not be the case. The project uses scikit-learn, pandas and matplotlib to analyse and visualise data and to evaluate the performance of the models explored.