Credit_Risk_Analysis

Credit Risk sees good loans outnumbering risky ones. Therefore, it is important to employ various training and evaluation techniques so that the model can get a good understanding of the data.

DELIVERABLE 1

In this project, we will be using imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

We will evaluate three machine learning models and determine which is the best for predicting credit risk. You can find the code for this part of the project here

The steps involved in this analysis are as follows:

Before we start, import all the dependencies for this project.

STEP 1: Transform the data into a usable form which involves:

Loading the data
Dropping NULL values from colummns and rows
Converting strings to numerical datatypes
Converting target column values to High Risk and Low Risk based on their values

STEP 2: Split the data into Training and Testing sets

Going a little further, we can

Check the balance of target values
Check the shape of the X training set

STEP 3: Oversampling: Here you will compare two oversampling algorithms to determine which perfomrs better.

Using Naive Random Oversampling

Using SMOTE Oversampling

STEP 4: Undersampling: Let us use Cluster Centroids Algorithm here.

DELIVERABLE 2

STEP 5: Over and Under Sampling (SMOTEENN)

DELIVERABLE 3

Here, we will use imblearn.ensemble, _BalancedRandomForestClassifier and EasyEnsembleClassifier to predict credit risk and evaluate each model.

You can find the code for this part of the project here

Before we start, ensure you have installed all the necessary libraries. If not, do a quick pip install imblearn and pip install -U scikit-learn. Bring in all the dependencies as well.

STEP 1: Much like the before, bring in the CSV and clean it up so it can be used for risk analysis and testing.

STEP 2: Split the data into Training and Testing sets

STEP 3: Ensemble Learners: Here, you will train a Balanced Random Forest Classifier and an Easy Ensemble AdaBoost classifier to see which one gives better results.