Diabetes Prediction using Machine Learning

Introduction

This project aims to predict whether a person has diabetes or not using machine learning algorithms. The dataset used for this project is the famous Pima Indians Diabetes Database. This is a binary classification problem where the output is either 1 (diabetic) or 0 (non-diabetic).

Table of Contents

Project Overview

The main steps involved in this project are:

  1. Data Preprocessing
  2. Exploratory Data Analysis (EDA)
  3. Model Selection
  4. Model Training
  5. Model Evaluation
  6. Model Deployment (optional)

The project uses various machine learning algorithms to compare their performance in predicting diabetes.

Data Description

The dataset contains the following columns:

  1. Pregnancies: Number of times pregnant
  2. Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. BloodPressure: Diastolic blood pressure (mm Hg)
  4. SkinThickness: Triceps skin fold thickness (mm)
  5. Insulin: 2-Hour serum insulin (mu U/ml)
  6. BMI: Body mass index (weight in kg/(height in m)^2)
  7. DiabetesPedigreeFunction: Diabetes pedigree function (a function which scores likelihood of diabetes based on family history)
  8. Age: Age in years
  9. Outcome: Class variable (0 or 1)

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/diabetes-prediction.git
    cd diabetes-prediction