/Loan-Default-Prediction

Risk Analysis and Profit Optimization for loans

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Overview

Using loan data from Lending Club we use machine learning to predict the risk of loan defaults. Additionally, using the results from the predictive modeling, we improve the potential return on investment.
Details of the implementation can be found here.

Table of Contents

  1. Introduction
  2. Description of Data
  3. Model Details
  4. Model Results
  5. Return on Investment

Introduction

Suppose a bank wants to know whether potential loan applicants will default on a loan. Loan information for a client is provided and a binary outcome of fully paid or default is predicted. We will use logistic regression, random forest, neural network, xgboost, and ensemble classifieres to create a model. This will provide useful metrics and help improve return on investment for the company. Diagram

Description of Data

Set of features:

Each row represents a client's financial information
loan_amnt term int_rate installment grade emp_length home_ownership annual_inc verification_status loan_status purpose dti delinq_2yrs earliest_cr_line open_acc pub_rec revol_bal revol_util total_acc initial_list_status total_pymnt application_type mort_acc pub_rec_bankruptcies
0 5000 36 months 7.35% 155.19 A 5 years MORTGAGE 60000.0 Not Verified Fully Paid car 15.76 0 Oct-04 12 0 3697 13.20% 25 w 5385.245133 Individual 1 0

Model Details

Many models were trained and fitted, but the final model chosen is a ensemble model by stacking method: Model_Details

Model Results

Model Accuracy: 68%
Overall return without model: -20.62%
Overall return with model: -7.90%
Overall percent improvement: 84.04%

Returns

  • Left: If the model predicted fully paid with 75% probability, this would be categorised to be in the 70% - 80% range
  • Right: If the model predicted 75% probability, the average improvement is 18%
  • With increasing probabilities, there is an increasing improvement on return until the 80% – 100% range. In this range, few to no loans are defaulted on so there is not much opportunity to improve returns