/Credit_Risk_Analysis

In this project, I am going to use machine learning to create prediction models for this classification problem of credit risk. I will be using different techniques to train and evaluate models with given unbalanced classes.

Primary LanguageJupyter Notebook

Credit_Risk_Analysis

image

Overview

In this project I am going to use machine learning to create prediction models for this classification problem of credit risk. Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. I will be using different techniques to train and evaluate models with unbalanced classes.

Results

  • BACKGROUND INFO:

image

  • Precision Score= (Predicted True/ (Predicted True + False Positive)

    -People that were positive, we want to know the likelihood of actually being positive.

  • Recall Score= (Predicted True/ Predicted True + False Negative)

    -Person knows has a good loan status, but wants to know what the loan officer will give.

BalancedRandomForestClassifier

Screen Shot 2020-11-08 at 4 39 58 PM

  • Accuracy_score (r_squared)= .79
  • Precision= .99
  • Recall= .85

Ensemble AdaBoost Classifier

Screen Shot 2020-11-08 at 4 47 27 PM

  • Accuracy_score (r_squared)= .91
  • Precision= .99
  • Recall= .93

Naive Random Oversampling w/ Logistic Regression

Screen Shot 2020-11-08 at 4 51 00 PM

  • Accuracy_score (r_squared)= .68
  • Precision= .99
  • Recall= .68

SMOTE Oversampling w/ Logistic Regression

Screen Shot 2020-11-08 at 4 53 20 PM

  • Accuracy_score (r_squared)= .66
  • Precision= .99
  • Recall= .69

ClusterCentroids Undersampling w/ Logistic Regression

Screen Shot 2020-11-08 at 4 54 57 PM

  • Accuracy_score (r_squared)= .60
  • Precision= .99
  • Recall= .53

Combination (Over and Under) Sampling w/ Logistic Regression

Screen Shot 2020-11-08 at 4 56 42 PM

  • Accuracy_score (r_squared)= .66
  • Precision= .99
  • Recall= .64

Summary

Overall, the best model that was generated to predict the unbalanced classification problem of credit risk is the Ensemble AdaBoost Classifier as well as the Balanced Random Forest Classifier due to their high precision, recall, and accuracy scores being near 1 which is what we want in this prediction problem. But I would recommend the Ensemble AdaBoost Classifier due to its higher overall scores for a classification prediction.