
In this project we will try to predict if the person has diabetes has or not.

Dataset to be used : https://www.kaggle.com/uciml/pima-indians-diabetes-database

1. Exploratory Data Analysis and Data Visualization

  • General View
  • Categorical Variables Analysis
  • Numerical Variables Analysis
  • Target Analysis

2. Data Preprocessing and Feature Engineering

  • General View - Recap - Remember Dataset
  • Outlier Analysis
  • Missing Values Analysis
  • Feature Creation
  • Label and/or One Hot Encoding
  • Standardization
  • Save the Final Dataset --> Pickle Dataset

3. Modeling

  • Logistic Regression
  • Naive Bayes Classifier
  • K-Nearest Neighbors Classifier
  • Support Vector Machines
  • Artificial Neural Network Models
  • DecisionTreeClassifier
  • BaggingClassifier
  • RandomForestClassifier
  • AdaBoostClassifier
  • Gradient Boosting Classifier
  • XGBoost - XGBClassifier
  • LightGBM - LGBMClassifier
  • CatBoost - CatBoostClassifier
  • NGBoost - NGBClassifier

4. Pickle the Models, Saving the Model for later Use

5. Comparison of Metrics of each Model

--> For each model steps to follow;

  • Model and Prediction
  • Evaluation of Model
  • Model Tuning
  • Model Visualization (Feature Importances, ROC/AUC Curve, Confusion Matrix, etc.)
  • Saving the Model