Machine Learning Pipeline on Diabetes Dataset

Business Problem:

Developing a machine learning model that can predict whether people have diabetes when their characteristics are specified.

Dataset Story:

The dataset is part of the large dataset held at the National Institutes of Diabetes-Digestive-Kidney Diseases in the USA. Data used for diabetes research on Pima Indian women aged 21 and over living in Phoenix, the 5th largest city of the State of Arizona in the USA. It consists of 768 observations and 8 numerical independent variables. The target variable is specified as "outcome"; 1 indicates positive diabetes test result, 0 indicates negative.

Variables:

Pregnancies: Number of pregnancies Glucose: Glucose. BloodPressure: Blood pressure. SkinThickness: Skin Thickness Insulin: Insulin. BMI: Body mass index. DiabetesPedigreeFunction: A function that calculates our probability of having diabetes based on our ancestry. Age: Age (years) Outcome: Information whether the person has diabetes or not. Have the disease (1) or not (0))

Project Stages:

Exploratory Data Analysis
Data Preprocessing
Model & Prediction
Model Evaluation
Model Validation: Holdout
Model Validation: 10-Fold Cross Validation
Prediction for A New Observation

SerdarTafrali/Machine_Learning_Pipeline_on_Diabetes_Dataset

Machine Learning Pipeline on Diabetes Dataset

Business Problem:

Dataset Story:

Variables:

Project Stages: