In this Challenge, you’ll use various techniques to train and evaluate a model based on loan risk. You’ll use a dataset of historical lending activity from a peer-to-peer lending services company to build a model that can identify the creditworthiness of borrowers.
The instructions for this Challenge are divided into the following subsections:
Split the Data into Training and Testing Sets
Create a Logistic Regression Model with the Original Data
Write a Credit Risk Analysis Report
The pupose of this analysis is to use data to train and evaluate for loan risks. From this analysis we can identify the creditworthiness of borrowers.
I used the data provided in the CSV file that included the loan size, interest rates, borrower income, debt-to-income ratio, derogatory marks, total debt, and current loan status.
I used the logistic regression model to train and evaluate the borrowers to create if they are healthy or a risky investment.
Seperating the data into training and then testing yielded the results to test the model.
The linear regression and confusion matrix results showed the following:
- 18679 True Negatives with 80 false negatives. 0.004% inacuraccy.
- 67 False positives with 558 True positives. 12% inaccuracy.
Classification report was 99% accurate.
-
Healthy Loans (0)
- Precision 100%
- Recall 100%
- F1-Score 88%
-
1 Risky Loans (1)
- Precision 100%
- Recall 100%
- F1-Score 88%
The logistic regression model along with the confusion matrix showed us that this is an accurate analysis of credit worthiness. This successfully showed that a loan would be a healthy or risky investment. The logistic regression model trained the model and the confusion matrix gave us a test of it's accuracy. The more data we obtain the better the results we can display. For this data, I would recommend using this type of model in providing a valid due diligence in evaluating credit risk.