Overview

Using loan data from Lending Club we use machine learning to predict the risk of loan defaults. Additionally, using the results from the predictive modeling, we improve the potential return on investment.
Details of the implementation can be found here.

Introduction

Suppose a bank wants to know whether potential loan applicants will default on a loan. Loan information for a client is provided and a binary outcome of fully paid or default is predicted. We will use logistic regression, random forest, neural network, xgboost, and ensemble classifieres to create a model. This will provide useful metrics and help improve return on investment for the company.

Description of Data

Set of features:

Each row represents a client's financial information

	loan_amnt	term	int_rate	installment	grade	emp_length	home_ownership	annual_inc	verification_status	loan_status	purpose	dti	delinq_2yrs	earliest_cr_line	open_acc	pub_rec	revol_bal	revol_util	total_acc	initial_list_status	total_pymnt	application_type	mort_acc	pub_rec_bankruptcies
0	5000	36 months	7.35%	155.19	A	5 years	MORTGAGE	60000.0	Not Verified	Fully Paid	car	15.76	0	Oct-04	12	0	3697	13.20%	25	w	5385.245133	Individual	1	0

Model Details

Many models were trained and fitted, but the final model chosen is a ensemble model by stacking method:

Model Results

Model Accuracy: 68%
Overall return without model: -20.62%
Overall return with model: -7.90%
Overall percent improvement: 84.04%

Left: If the model predicted fully paid with 75% probability, this would be categorised to be in the 70% - 80% range
Right: If the model predicted 75% probability, the average improvement is 18%
With increasing probabilities, there is an increasing improvement on return until the 80% – 100% range. In this range, few to no loans are defaulted on so there is not much opportunity to improve returns

ZhuLeon/Loan-Default-Prediction

Overview

Table of Contents

Introduction

Description of Data

Model Details

Model Results