credit-risk-classification

Background

In this Challenge, you’ll use various techniques to train and evaluate a model based on loan risk. You’ll use a dataset of historical lending activity from a peer-to-peer lending services company to build a model that can identify the creditworthiness of borrowers.

Instructions

The instructions for this Challenge are divided into the following subsections:

Split the Data into Training and Testing Sets

Create a Logistic Regression Model with the Original Data

Write a Credit Risk Analysis Report

Credit Risk Analysis Report

The pupose of this analysis is to use data to train and evaluate for loan risks. From this analysis we can identify the creditworthiness of borrowers.

I used the data provided in the CSV file that included the loan size, interest rates, borrower income, debt-to-income ratio, derogatory marks, total debt, and current loan status.

I used the logistic regression model to train and evaluate the borrowers to create if they are healthy or a risky investment.

Seperating the data into training and then testing yielded the results to test the model.

Results

The linear regression and confusion matrix results showed the following:

  • 18679 True Negatives with 80 false negatives. 0.004% inacuraccy.
  • 67 False positives with 558 True positives. 12% inaccuracy.

Classification report was 99% accurate.

  • Healthy Loans (0)

    • Precision 100%
    • Recall 100%
    • F1-Score 88%
  • 1 Risky Loans (1)

    • Precision 100%
    • Recall 100%
    • F1-Score 88%

Summary

The logistic regression model along with the confusion matrix showed us that this is an accurate analysis of credit worthiness. This successfully showed that a loan would be a healthy or risky investment. The logistic regression model trained the model and the confusion matrix gave us a test of it's accuracy. The more data we obtain the better the results we can display. For this data, I would recommend using this type of model in providing a valid due diligence in evaluating credit risk.