/Bank_Loan_Approval_Using_Artificial_Neural_Nets

In this project, we aim to build and train a deep neural network model to predict the likelihood of liability customers (depositors) buying personal loans based on customer features such as age, experience, income, location, family, education, existing mortgage, credit cards, etc.

Primary LanguageJupyter Notebook

Bank Loan Approval Using Artificial Neural Net

Project objective:

  • Understand how Artificial Intelligence and Machine Learning are applied in the banking industry.
  • Understand the theory and intuition behind Deep Neural Networks.
  • Import key Python libraries, dataset, and perform Exploratory Data Analysis (EDA).
  • Perform data visualization using Seaborn.
  • Standardize the data and split them into train and test datasets.
  • Build a deep learning model using Keras with Tensorflow 2.0 as a back-end.
  • Assess the performance of the model and ensure its generalization using various KPIs.

Project Structure

Task 1: Understand the problem statement and business case

  • In this hands-on project, we will assume that you work as a data scientist at a bank
  • In this project, we aim to build and train a deep neural network model to predict the likelihood of liability customers (depositors) buying personal loans based on customer features such as age, experience, income, location, family, education, existing mortgage, credit cards, etc.
  • Lenddo (https://lenddo.com) is a leading startup that uses advanced machine learning to analyze over 12,000+ features from various sources of data to predict an individual’s creditworthiness (e.x: social media count use, geolocation data, etc,)
  • Most bank customers are depositors and the bank would like to start targeting those customers and issue personal loans to them
  • The bank would like to retain those depositors and convert them to personal loan customers and therefore growing annual revenue
  • In the past, the bank ran a marketing campaign that resulted in a 9% success rate.
  • The bank management team would like to analyze this campaign results and develop a model to target customers who have high probability of obtaining personal loans.
Column Name Column Description
ID Customer ID
Age Customer's age in yearrs
Experience Number of years of professional experience
Income Annual income of customer ($000)
Zipcode Home Adress Zip Code
Family Family size of the customer
CCAvg Average spending on credit cards per months ($000)
Education Education level (1: Undergrad;2: Graduated, 3: Advanced/Professional)
Mortgage Value of house mortgage ($000)
Personal Loan Did this customer accept the personal loan offered in the last campaign or not?
Security Account Does the customer have a securities account with the bank?
Online Does the customer use internet banking facilities?
Credit Card Does the customer uses a credit card issued by UniversalBank

Task 2: Import Datasets and Libraries

Libraries:

  • Pandas: Dataframe manipulating library
  • Numpy: Numerical values, arrays, matrices handling library
  • Matplotlib & Seaborn: Data visualization libraries
  • Tensorflow: Google framework for building and train AI model

Dataset

  • The dataframe contains 5000 rows

Task 3: Exploratory Data Analysis

  • The dataframe contains 14 columns in total
  • Customers range in age from 23 to 67, with an average age of 45
  • The average of income is $73,774
  • The average family size is 2
  • Credit Card Debt is an average of $1,938 per person

  • 29.4% of customers in this dataset have credit card debt
  • 9.6% of customers in this dataset have a personal loan

Task 4: Perform Data Visualization

Visualize personal Loan column

  • Percentage of customers who accepted personal loan ~ 9%

Visualize Education feature

  • Most of customer has undergraduate degree

Visualize Age

  • Uniform distribution between 30-60 years

Visualize credit card availability feature

  • Recall that ~29% of customers have credit cards

Visualize income data

  • Most customers have incomes that range between 45K and 60K per year
  • Data is skewed with less customers earning above 100K

Customers with loan vs Customer without loan

  • Mean income of customers who have personal loans is generally high ~ 144K and average CC of 3.9K
  • Mean income of customers who have do not have personal loans is generally low ~ 66K and average CC of 1.7K
Average Credit Card Debt Average Income
  • Customers who took personal loans tend to have higher income & debt
Average Credit Card Debt Average Income

Correlation plot

  • Stong Positive correlation between experience and age
  • Strong positive correlation between CC average and income

Credit Card Spending

  • Most customers Credit card spending is between 0-4K
  • Data is positively skewed

Credit card average and personal loan relationship

  • Customers who have large credit card average tend to have personal loans

Task 5: Prepare the data to feed the model

Features dataframe

  • Dataframe X created by dropping the target variable ‘Personal Loan’

Target dataframe

  • Another dataframe y created contains only target variable 'Personal Loan'

Apply One Hot Encoding Method

  • Apply the one hot encoding method to prepare for the algorithm and get a better prediction
  • With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns.

Scale and split the data

  • Apply the one hot encoding method to prepare for the algorithm and get a better prediction
    • With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns.
  • This is vital to ensure that all the features that are fed to the artificial neural networks have equal weights.
  • We must scale the data before training the model using StandardScaler from Sklearn
    • Basically, StandardScaler removes the mean and scales each feature or variable to unit variance
  • Split the data into testing and training sets
    • 90% for trainning
    • 10% for testing

Task 6: Understand the theory and intuition behind Artificial Neural Networks

Neuro Mathematical Model

  • Using an input channel called dendrites, the neuron collects signals, processes them in its nucleus, and then generates an output in the form of long thin branches called axons.

Single neuron model math

  • Bias allows to shift the activation function curve up or down.
  • Number of adjustable parameters = 4 (3 weights and 1 bias).
  • Activation function F.

ANN Training Process

  • Inputs X, arrive through the preconnected path
  • Input is modeled using real weights W. The weights are usually randomly selected.
  • Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer.
  • Calculate the error in the outputs
  • Compute the gradient to change the weight: Travel back from the output layer to the hidden layer to adjust the weights such that the error is decreased.
  • Keep repeating the process until the desired output is achieved (the loss function is stable: Optimization done!)
ANN Training Process Error Performance Measure vs Epoch of Times

Gradient Descent

  • Gradient descent is an optimization algorithm used to obtain the optimized network weight and bias values
  • It works by iteratively trying to minimize the const function
  • It works by calculating the gradient of the cost function and moving in the negative direction until the local/global minimum is achieved
  • If the positive of the gradient is taken, local/global maximum is achieved

Learning Rate

  • The size of the steps taken are called the learning rate
  • The learning rate increases, the area covered in the search space will increase so we might reach global minimum faster
  • However, we can overshoot the target
  • For small learning rates, training will take much longer to reach optimized weight values

Task 7: Build a simple Multi Layer Neural Network

Task 8: Compile and train a Deep Learning Model

Task 9: Assess the performance of the trained model