/Credit_Risk_Analysis

Machine Learning project utilizing various models to determine credit risk.

Primary LanguageJupyter Notebook

Credit_Risk_Analysis

Overview

  • The objective of this project was to evaluate models and various sampling techniques to predict credit risk.
  • The Logistic regression model performance was compared when using the follwing sampling algorithms:
    • RandomOverSampler
    • SMOTE
    • ClusterCentroids
    • SMOTTEEN

This was compared to the performance of BalancedRandomForestClassifier and EasyEnsembleClassifier Machine Learning models.

Results

Oversampling

RandomOverSampler + LogisticRegression RandomOverSampler + LogisticRegression

SMOTE + LogisticRegression SMOTE + LogisticRegression

Undersampling

ClusterCentroids + LogisticRegression ClusterCentroids + LogisticRegression

Combination (Over and Under Sampling)

SMOTEENN + LogisticRegression SMOTEENN + LogisticRegression

Ensemble Learners

BlancedRandomForestClassifier BlancedRandomForestClassifier

EasyEnsembleClassifier EasyEnsembleClassifier

Summary

It seems that undersampling really reduced the recall and barely scored over 50% for the accuracy score. SMOTTEENN scored a bit better on both metrics, but it seems that if this dataset were to be analyzed using logistic regression, oversampling would be the way to go.

The best way to predict credit risk, however, is to use the EasyEnsembleClassier model which scored much better than every other method tried for this project.

With a balanced accuracy score of 93.17%, 99% precision, and 94% recall, it is extemely successful at predicting credit risk correctly.