INTRODUCTION

Rapid Finance handles all types of house loans. They have a presence in all metropolitan, semi-urban, and rural regions. The customer initially applies for a house loan through ICAM - a standardized application, following which the firm verifies the customer's loan eligibility manually. Because of the unpredicted growth that the business experienced last quarter, there is a shortage of staff who can go through the documents and come up with a verdict on the approval. The company wishes to automate the loan eligibility procedure (in real-time) based on the information supplied by the applicant. Not only will automation of the loan approval help the company reduce its expenses, but it will also enhance the scalability of the business in the longevity of their procedures.

This is a classic supervised classification problem, a task in which we must predict whether or not a loan will be approved. The data set we will be using could be found here: https://www.kaggle.com/datasets/altruistdelhite04/loan-prediction-problem-dataset

DATA SET DESCRIPTION

There is a total of 614 rows and 13 columns in our data set of which 8 are categorical variables, 4 continuous, and 1 unique loan ID identifier.

The dataset attributes are listed below along with their descriptions.

GETTING STARTED

The purpose of the following analysis is to predict whether an applicant is approved for a loan provided the required information.

For this we will be looking at the following machine learning models:

  • Logistic Regression
  • K-Nearest Neighbour (KNN)
  • Support Vector Machine (SVM)
  • Decision Tree
  • Random Forest
  • Gradient Boost

IMPORTING MODULES

PROPRIETARY ANALYSIS

PREPROCESSING THE DATA

BEFORE SKEWNESS REDUCTION TREATMENT

AFTER SKEWNESS REDUCTION TREATMENT

CORRELATION MATRIX

FEATURE SEPARATION

Logistic Regression

K-Nearest Neighbour (KNN)

Support Vector Machine (SVM)

Decision Tree

Random Forest

Gradient Boosting

Model Performance Comparison