This project aims to predict whether a person has diabetes or not using machine learning algorithms. The dataset used for this project is the famous Pima Indians Diabetes Database. This is a binary classification problem where the output is either 1 (diabetic) or 0 (non-diabetic).
- Project Overview
- Data Description
- Installation
- Usage
- Model Training and Evaluation
- Results
- Contributing
- License
The main steps involved in this project are:
- Data Preprocessing
- Exploratory Data Analysis (EDA)
- Model Selection
- Model Training
- Model Evaluation
- Model Deployment (optional)
The project uses various machine learning algorithms to compare their performance in predicting diabetes.
The dataset contains the following columns:
Pregnancies
: Number of times pregnantGlucose
: Plasma glucose concentration a 2 hours in an oral glucose tolerance testBloodPressure
: Diastolic blood pressure (mm Hg)SkinThickness
: Triceps skin fold thickness (mm)Insulin
: 2-Hour serum insulin (mu U/ml)BMI
: Body mass index (weight in kg/(height in m)^2)DiabetesPedigreeFunction
: Diabetes pedigree function (a function which scores likelihood of diabetes based on family history)Age
: Age in yearsOutcome
: Class variable (0 or 1)
- Clone the repository:
git clone https://github.com/yourusername/diabetes-prediction.git cd diabetes-prediction