Starbucks-Project

Overview

Starbucks Udacity Data Scientist Nanodegree Capstone Project data set is a simulation of customer behavior on the Starbucks rewards mobile application. Starbucks sends offers to users that may be an advertisement, discount, or buy one get one free .

Data

This is the link to the dataset

This data set contains three files:

The first file describes the characteristics of each offer, including its duration and the amount a customer needs to spend to complete it .
The second file contains customer demographic data including their age, gender, income, and when they created an account on the Starbucks rewards mobile application.
The third file describes customer purchases and when they received, viewed, and completed an offer. An offer is only successful when a customer both views an offer and meets or exceeds its difficulty within the offer's duration.

Statement of Problem

The problem is to build a model that predicts whether a customer will respond to an offer. The strategy for solving this problem has four steps.

Combining the offer portfolio, customer profile, and transaction data. Each row of this combined dataset will describe an offer's attributes, customer demographic data, and whether the offer was successful.
Assessing the accuracy and F1-score of a naive model that assumes all offers were successful. This provides a baseline for evaluating the performance of models that I construct. Accuracy measures how well a model correctly predicts whether an offer is successful. However, if the percentage of successful or unsuccessful offers is very low. For this situation, evaluating a models' precision and recall provides better insight to its performance. I chose the F1-score metric because it is "a weighted average of the precision and recall metrics".
Comparing the performance of logistic regression, random forest, and gradient boosting models.
Refining the parameters of the model that has the highest accuracy and F1-score.

Blog Post

Blog post about this project can be found on Medium

Libraries

Python Data Analysis Library
Numpy
Matplotlib
Scikit-learn: Machine Learning in Python
Seaborn: Statistical Data Visualization
re: Regular expression operations
os: Miscellaneous operating system interfaces
Joblib: running Python functions as pipeline jobs

makozi/Starbucks-Project