The purpose of this project is to create and evaluate models that predict credit risk. Credit risk has an unbalanced classification issue, so finding the right model to predict the risk for a lending company to supply loans. Using supervised learning, we trained and evaluated models and will provide a recommendation on which will best predict credit risk.
- The Naive Random Oversampling model results:
- Balanced accuracy score of 0.6636
- High-risk precision score of 0.01
- High-risk recall score of 0.70
- Low-risk precision score of 1.00
- Low-risk recall score of 0.62.
- The SMOTE Oversampling model results:
- Balanced accuracy score of 0.6622
- High-risk precision score of 0.01
- High-risk recall score of 0.63
- Low-risk precision score of 1.00
- Low-risk recall score of 0.69
- The Undersampling model results:
- Balanced accuracy score of 0.5447
- High-risk precision score of 0.01
- High-risk recall score of 0.69
- Low-risk precision score of 1.00
- Low-risk recall score of 0.40
- SMOTEEN Combination over/under sampling model results:
- Balanced accuracy score of 0.6447
- High-risk precision score of 0.01
- High-risk recall score of 0.72
- Low-risk precision score of 1.00
- Low-risk recall score of 0.57
- Balanced Random Forest Classifier
- Balanced accuracy score of 0.7885
- High-risk precision score of 0.01
- High-risk recall score of 0.72
- Low-risk precision score of 1.00
- Low-risk recall score of 0.57
- Easy Ensemble AdaBoost Classifier
- Balanced accuracy score of 0.9316
- High-risk precision score of 0.03
- High-risk recall score of 0.70
- Low-risk precision score of 1.00
- Low-risk recall score of 0.87
When evaluating all the models and methods used, it was found that many models’ low-risk categories have high precision scores of 1.00. Yet, reviewing the Oversampling and Undersampling models, lower accuracy scores were recorded, with Undersampling having the lowest scores. The high-risk loans are important to predict, so when we analyze the Ensemble Learners, specifically the Easy Ensemble AdaBoost Classifier, it had the best performance of all when looking at accuracy and recall scores. It is recommended to use this model for predicting credit risk.