Practice code from above book.
- Supervised and unsupervised learning
- Cross validation
- Evaluation metrics
- Project structure for any ML project
- Approaching categorical variables
- OneHot encoding + Logistic Regression model
This gives us AUC score of ~0.78 which is good. As the AUC score is in range of 0-1 and 1 being the perfect model. - LabelEncoding
- Random Forest model
- This gives us AUC score of ~0.71 which is worse than Logistic regression model.
- This model also takes more time and space compared to Logistic regression model.
- This implies that we should never ignore basic model when training for the problem.
- XGBoost model
- This gives us AUC score of ~0.76 which is better than RandomForest model, but still not better than Logistic regression model.
- This model also takes more time and space compared to Logistic regression and RandomForest models.
- Random Forest model
- OneHot encoding + Logistic Regression model