LOAN APPROVAL FORECASTING

Overview

The application uses a trained machine learning model to predict the issuance of a loan to a person. The model is deployed using Streamlit. The app should help people determine whether they will be eligible for a loan or not.

Technologies used

Features

Metrics

METRIC: RECALL

It is required to identify all unreliable clients. If the prediction is not clear, then it is worth rejecting this client in order to avoid subsequent problems with him. It is for this reason that the best metric for this would be recall.

Confusion matrix of GradientBoostingClassifier

Methods

Object-Oriented Programming (OOP)
Weight of Evidence(WOE), Information Value(IV)
IQR
Data Analysis
Data Visualization
Data Cleaning
Feature Engineering
Fine-tuning hyperparameters
User Interface Design
Model Deployment using Streamlit
Integration of User Inputs
Data Upload Mechanism
New Data Handling in the Web App

Problems encountered in the process

SMOTE

As you know, SMOTE is not recommended to be applied to categorical data. It uses KNN to create new samples to solve the problem of data imbalance. Since 90% of dataset is categorical data, using this method will lead to anomalies in the new samples created by SMOTE. Basically, getting numbers other than 1 or 0 in binary columns, getting numeric values in several columns of one hot encoded column at once. All this leads to the fact that a mixture of abnormal and normal data gets into the training data, on which the model is then trained. The model may have good performance when evaluated on training data, but in fact, when predicting on new data(data entered by the user), the model's performance will be many times worse.

To deal with this problem, I used the compute_sample_weight method from the scikit-learn library. With SMOTE method, 0 class: 0.90, 1 class: 0.91, and after weights sampling, 0 class: 0.91, 1 class: 0.80.

Example and Usage

You can check how this app works by yourself, just click on OPEN ON STREAMLIT badge.

This app is using trained machine learning model to predict loan approval based on person's data. If you do not have any income, then you are denied automatically. The app assumes that you already have a credit history of at least 12 months. Fill in the fields below and click Predict.

Some characteristics may be missing, for example, you work as a software engineer, but there is no this profession in the occupation type menu, then you should choose the most similar activity to yours.

Getting started

This step involves cloning the repository from the provided GitHub URL. After cloning, change your working directory to the newly cloned repository with cd loan_approval_forecasting:

git clone https://github.com/dysff/loan_approval_forecasting.git
cd loan_approval_forecasting

Create a virtual environment(optional but recommended):

python -m venv venv

Activate the virtual environment:

Windows:
venv\Scripts\activate

Unix/MacOS:
source venv/bin/activate

Requirements installation:

pip install -r requirements.txt

License

This project is licensed under the MIT License - see the LICENSE.md file for details.