- The objective of this project was to evaluate models and various sampling techniques to predict credit risk.
- The Logistic regression model performance was compared when using the follwing sampling algorithms:
- RandomOverSampler
- SMOTE
- ClusterCentroids
- SMOTTEEN
This was compared to the performance of BalancedRandomForestClassifier and EasyEnsembleClassifier Machine Learning models.
RandomOverSampler + LogisticRegression
ClusterCentroids + LogisticRegression
It seems that undersampling really reduced the recall and barely scored over 50% for the accuracy score. SMOTTEENN scored a bit better on both metrics, but it seems that if this dataset were to be analyzed using logistic regression, oversampling would be the way to go.
The best way to predict credit risk, however, is to use the EasyEnsembleClassier model which scored much better than every other method tried for this project.
With a balanced accuracy score of 93.17%, 99% precision, and 94% recall, it is extemely successful at predicting credit risk correctly.