We are a team of 7 (you could see on collaborators list) who worked together to build an enhanced predictive model for our dataset as Rakamin Data Science Bootcamp Final Project. We gathered dataset from here, if you curious more about the datasets please kindly click the link. Our main objective on this project is build an enhanced predictive model for coupon recommendation as problem business solving on coupon acceptance rate. Our project workflow consisted as 4 stages, you could see the summary about it below:
Stage 0 is an early stage where we implemented 'ask' in data life cycle. There are details about our role, problem statement, goal, objective and business metrics of our project.
Stage 1 is a next step that we focused on gathering insights from statistical views.
What we have done on this stage:
- Descriptive Analysis
- Univariate Analysis
- Multivariate Analysis
- Business Insight As Business Recommendation
Stage 2 is another next step that we did manipulation on data before it is used in order to build the model.
What we have done on this stage:
- Handle Missing Values
- Handle Duplicated Data
- Handle Outliers
- Feature Transformation
- Feature Encoding
- Handle Class Imbalance
- Feature Selection
- Feature Extraction
Stage 3 is a step where we tested our data train to machine learning model and evaluated it. On this stage we created 7 different preprocessing treatment on datasets. We tested it to 5 different models: logistic regression, decision tree, random forest, XGBoost and CatBoost. The objective of it's action is at the end of this stage we couldn't only know which the better model also the better preprocessing treatment on dataset. As we know, in data science everything is experimental, so we did it to get the better result.
What we have done on this stage:
- Preprocessing Data
- Splitting Data (Data Train and Data Test)
- Feature Engineering
- Model Testing
- Tuning Hyperparameters and feature selection
- Model Selection
- Evaluation most impacful/influence to model output using shap library
Based on our project's result, out best model is CatBoost with accuracy score 77% and precision score 77%. Our model performance could increase coupon acceptance rate and B/C ratio by 0.61x (from 1.7x to 2.31x).
- We presented the result of each stage progress in Bahasa