This project is part of the Capstone Project from the Data Science Nanodegree Program by Udacity in collaboration with Starbucks. We have data sets that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. The main objective is to be able to identify which offers and customers are the most adherent to the company's campaigns and, subsequently, create a model to predict the success of an offer based on the demographic and categorical information contained in the available databases.
The code should run with no issues using Python versions 3 with the following libraries:
- Machine Learning: NumPY, Scipy, Pandas, sklearn
- Data Visualization: Plotly, Seaborn
- Data
- portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
- profile.json - demographic data for each customer
- transcript.json - records for transactions, offers received, offers viewed, and offers completed
- Code
- Starbucks_Capstone_notebook.ipynb - code that runs all the analysis
I used three different models: Random Forest Classifier, Gradient Boosting Classifier, Ada Boost Classifier. The one that obtained the best result was the Gradient Boosting Classifier with an overall accuracy of 0.67%.
The main findings of the code you can find in here
- scikit-learn - Random Forest Classifier
- scikit-learn - Ada Boost Classifier
- Complete Machine Learning Guide
- What makes a good F1-Score
- Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost
Must give credit to Starbucks for the data and Udacity for the training! Otherwise, feel free to use the code here as you would like!