Credit_Risk_Analysis

Overview

In this project I am going to use machine learning to create prediction models for this classification problem of credit risk. Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. I will be using different techniques to train and evaluate models with unbalanced classes.

Results

BACKGROUND INFO:

Precision Score= (Predicted True/ (Predicted True + False Positive)

-People that were positive, we want to know the likelihood of actually being positive.
Recall Score= (Predicted True/ Predicted True + False Negative)

-Person knows has a good loan status, but wants to know what the loan officer will give.

BalancedRandomForestClassifier

Accuracy_score (r_squared)= .79
Precision= .99
Recall= .85

Ensemble AdaBoost Classifier

Accuracy_score (r_squared)= .91
Precision= .99
Recall= .93

Naive Random Oversampling w/ Logistic Regression

Accuracy_score (r_squared)= .68
Precision= .99
Recall= .68

SMOTE Oversampling w/ Logistic Regression

Accuracy_score (r_squared)= .66
Precision= .99
Recall= .69

ClusterCentroids Undersampling w/ Logistic Regression

Accuracy_score (r_squared)= .60
Precision= .99
Recall= .53

Combination (Over and Under) Sampling w/ Logistic Regression

Accuracy_score (r_squared)= .66
Precision= .99
Recall= .64

Summary

Overall, the best model that was generated to predict the unbalanced classification problem of credit risk is the Ensemble AdaBoost Classifier as well as the Balanced Random Forest Classifier due to their high precision, recall, and accuracy scores being near 1 which is what we want in this prediction problem. But I would recommend the Ensemble AdaBoost Classifier due to its higher overall scores for a classification prediction.

bsamimi25/Credit_Risk_Analysis

Credit_Risk_Analysis

Overview

Results

Summary