Loan default prediction

Key finding: 1) Borrowers who are small business owners, do not meet the credit policy of lenders, have a higher interest rate, and have low fico are associated with high default risk. 2) The Logistic Regression model performs best in predicting loan default.

Business problem

Online lending platforms have experienced a rapid development in recent years thanks to their convienience and feasibility. However, they are facing various difficulties related to loan default, given their clients are individual or small business owners, and borrowers with low income who had been rejected by traditional banks.

Aims of the project

  1. Identify factors associated with repayment failures beased on financial information provided by customers.
  2. Training a Machine Learning model that is capable of predicting defaulters and non-defaulters based on clients’ financial information, in order to provide suitable support for loan approval decision making.

Data source:

https://www.kaggle.com/datasets/itssuru/loan-data

Research Design

Picture 1

Quick glance at the result!

The results of data analysis showed that borrowers who do not meet the credit policy of lenders, have a higher interest rate, and low fico are associated with a high default risk. The high risk is also observed in customers with the purpose of borrowing listed as “small business”.

###Proves are shown in figures and table below:

image

Counts of clients according to credit criteria (left) and percentage of fully paid/not fully paid clients in each type of credit policy (right).

image

Scatterplot of interest rate and fico, grouped by loan paid.

image

28.59% customers who do not meet credit policy with interest rate greater than 0.1 and fico lesser 737 wouldn't pay for their loan.

image Percentage of not_fully_paid/ fully_paid by purpose of borrowing!

All classification models were successfully built with satisfactory performance, with XGBoost and Random Forest accuracy exceeding 80%.

image

Logistic Regression performed best on prediction loan defaulters - 83% of the defaulted loan was correctly identified when decision threshold is set to 0.4.

image