/ml-course-project

This is a Python-based implementation of at least two different types of machine learning models on the task of "Home Credit Default Risk".

Primary LanguageJupyter Notebook

Home Credit Default Risk [Machine Learning Project]

This is a Python-based implementation of two different types of machine learning models [mentioned below] on the task of "Home Credit Default Risk".

Language and Libraries

Seaborn scikit_learn Seaborn numpy cplusplus Seaborn

About Dataset:

Home Credit Default Risk DataSet from Kaggle Competitions

Data

Table of Content:

Data Loading

!kaggle competitions download home-credit-default-risk

Exploratory Data Analysis

  1. Checking Missing Values (Data contains lots of null values and need to be clean or replace using Imputation Techniques)
  2. Checking Duplicate Data (The no. of duplicates in the data: 0)
  3. Data Visualization

Feature Engineering

  1. Feature Engineering Application Train Data

Data Prepration

  1. Merging all 6 Datasets - Key = SK_ID_CURR

Data Preprocessing

  1. Imputing Categorical & Numerical Data (SimpleImputer)
  2. Scaling Numerical Data (StandardScaler)
  3. Encoding Categorical Data (OneHotEncode)
  4. Class Balancing (RandomOverSampling)

Feature Selection

Model Used - LGBMClassifier

Classification

Models Used:

  1. LGBM Classifier About
  2. RandomForest Classifier About

Model Evaluation

HyperParameter Tunning

Results