/credit-risk-classification

In this repository, I use various techniques to train and evaluate a model based on loan risk. I use a dataset of historical lending activity from a peer-to-peer lending services company to build a model that can identify the creditworthiness of borrowers.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Credit Risk Classification

Overview of the Analysis

  • The purpose of this analysis is to evaluate two machine learning models to evaluate loan risk.
  • A dataset of historical lending activity from a peer-to-peer lending services company was used to identify the creditworthiness of borrowers.
  • The variables that were used to predict creditworthiness of borrowers were the value counts of the loan status. The two loan statuses used were Healthy (0) and High-Risk (1)
  • The Data was split into Training and Testing Sets by utilizing the train_test_split method
  • Two methods were used to analyze the Data:
    • A Logistic Regression Model was used with the original data
    • A Logistic Regression Model was created with Over-Sampled Training Data.

Results

  • Machine Learning Model 1: A Logistic Regression Model was used with the original data

    • Balanced Accuracy Score: 95%
    • Precision Score: 0-100% and 1-85% As can be seen the model was able to predict healthy loans better than high-risk loans
    • Recall Score: 0-99% and 1-91% As seen the model was able to correctly identify healthy loans better than high risk loans
  • Machine Learning Model 2: A Logistic Regression Model was created with Over-Sampled Training Data

    • Balanced Accuracy Score: 99%
    • Precision Score: 0-100% and 1-85% As can be seen the model was able to predict healthy loans better than high-risk loans
    • Recall Score: 0-99% and 1-91% As seen the model was able to correctly identify healthy loans better than high risk loans

Summary

The oversampled model generated a balanced accuracy score of 99% which is higher than the original model. This model also does better at catching false negative and false positive scores for both health and high-risk loans.