Credit_Risk_Analysis

Overview of Analysis

The purpose of this project is to create and evaluate models that predict credit risk. Credit risk has an unbalanced classification issue, so finding the right model to predict the risk for a lending company to supply loans. Using supervised learning, we trained and evaluated models and will provide a recommendation on which will best predict credit risk.

Results

Naive Random Oversampling

Naive Random Sampling

  • The Naive Random Oversampling model results:
    • Balanced accuracy score of 0.6636
    • High-risk precision score of 0.01
    • High-risk recall score of 0.70
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.62.

SMOTE Oversampling

  • The SMOTE Oversampling model results:
    • Balanced accuracy score of 0.6622
    • High-risk precision score of 0.01
    • High-risk recall score of 0.63
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.69

Undersampling

under sampling

  • The Undersampling model results:
    • Balanced accuracy score of 0.5447
    • High-risk precision score of 0.01
    • High-risk recall score of 0.69
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.40

SMOTEEN Combination over/under sampling

Combination sampling

  • SMOTEEN Combination over/under sampling model results:
    • Balanced accuracy score of 0.6447
    • High-risk precision score of 0.01
    • High-risk recall score of 0.72
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.57

Ensemble Learners

  • Balanced Random Forest Classifier
    • Balanced accuracy score of 0.7885
    • High-risk precision score of 0.01
    • High-risk recall score of 0.72
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.57

random forest

  • Easy Ensemble AdaBoost Classifier
    • Balanced accuracy score of 0.9316
    • High-risk precision score of 0.03
    • High-risk recall score of 0.70
    • Low-risk precision score of 1.00
    • Low-risk recall score of 0.87

Ada Boost

Summary

When evaluating all the models and methods used, it was found that many models’ low-risk categories have high precision scores of 1.00. Yet, reviewing the Oversampling and Undersampling models, lower accuracy scores were recorded, with Undersampling having the lowest scores. The high-risk loans are important to predict, so when we analyze the Ensemble Learners, specifically the Easy Ensemble AdaBoost Classifier, it had the best performance of all when looking at accuracy and recall scores. It is recommended to use this model for predicting credit risk.