Starbucks Udacity Data Scientist Nanodegree Capstone Project data set is a simulation of customer behavior on the Starbucks rewards mobile application. Starbucks sends offers to users that may be an advertisement, discount, or buy one get one free .
This is the link to the dataset
This data set contains three files:
-
The first file describes the characteristics of each offer, including its duration and the amount a customer needs to spend to complete it .
-
The second file contains customer demographic data including their age, gender, income, and when they created an account on the Starbucks rewards mobile application.
-
The third file describes customer purchases and when they received, viewed, and completed an offer. An offer is only successful when a customer both views an offer and meets or exceeds its difficulty within the offer's duration.
The problem is to build a model that predicts whether a customer will respond to an offer. The strategy for solving this problem has four steps.
-
Combining the offer portfolio, customer profile, and transaction data. Each row of this combined dataset will describe an offer's attributes, customer demographic data, and whether the offer was successful.
-
Assessing the accuracy and F1-score of a naive model that assumes all offers were successful. This provides a baseline for evaluating the performance of models that I construct. Accuracy measures how well a model correctly predicts whether an offer is successful. However, if the percentage of successful or unsuccessful offers is very low. For this situation, evaluating a models' precision and recall provides better insight to its performance. I chose the F1-score metric because it is "a weighted average of the precision and recall metrics".
-
Comparing the performance of logistic regression, random forest, and gradient boosting models.
-
Refining the parameters of the model that has the highest accuracy and F1-score.
Blog post about this project can be found on Medium
- Python Data Analysis Library
- Numpy
- Matplotlib
- Scikit-learn: Machine Learning in Python
- Seaborn: Statistical Data Visualization
- re: Regular expression operations
- os: Miscellaneous operating system interfaces
- Joblib: running Python functions as pipeline jobs
Open this Jupyter Notebook on Colab
This project is licensed under the MIT License