Credit_Risk_Analysis

Overview

The objective of this project was to evaluate models and various sampling techniques to predict credit risk.
The Logistic regression model performance was compared when using the follwing sampling algorithms:
- RandomOverSampler
- SMOTE
- ClusterCentroids
- SMOTTEEN

This was compared to the performance of BalancedRandomForestClassifier and EasyEnsembleClassifier Machine Learning models.

Results

Oversampling

RandomOverSampler + LogisticRegression

SMOTE + LogisticRegression

Undersampling

ClusterCentroids + LogisticRegression

Combination (Over and Under Sampling)

SMOTEENN + LogisticRegression

Ensemble Learners

BlancedRandomForestClassifier

EasyEnsembleClassifier

Summary

It seems that undersampling really reduced the recall and barely scored over 50% for the accuracy score. SMOTTEENN scored a bit better on both metrics, but it seems that if this dataset were to be analyzed using logistic regression, oversampling would be the way to go.

The best way to predict credit risk, however, is to use the EasyEnsembleClassier model which scored much better than every other method tried for this project.

With a balanced accuracy score of 93.17%, 99% precision, and 94% recall, it is extemely successful at predicting credit risk correctly.