/Early-stage-diabetes-risk-prediction-analysis

This dataset contains the sign and symptom data of newly diabetic or would-be diabetic patients. The data were collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a doctor.

Primary LanguageJupyter Notebook

Early-stage-diabetes-risk-prediction-analysis

About dataset.

This dataset contains the sign and symptpom data of newly diabetic or would be diabetic patient.This has been collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet, Bangladesh and approved by a doctor.

Diabetes is a disease that occurs when your blood glucose, also called blood sugar, is too high. Blood glucose is your main source of energy and comes from the food you eat. Insulin, a hormone made by the pancreas, helps glucose from food get into your cells to be used for energy.

Patients with diabetes have been multiplying in recent years. Early detection and the necessary diagnostics are made possible by machine learning models. We may now make predictions about a person's diabetes status by investigating data analysis and developing models.

Data attributes:

Age 1.20-65
Sex 1. Male, 2.Female
Polyuria 1.Yes, 2.No.
Polydipsia 1.Yes, 2.No.
sudden weight loss 1.Yes, 2.No.
weakness 1.Yes, 2.No.
Polyphagia 1.Yes, 2.No.
Genital thrush 1.Yes, 2.No.
visual blurring 1.Yes, 2.No.
Itching 1.Yes, 2.No.
Irritability 1.Yes, 2.No.
delayed healing 1.Yes, 2.No.
partial paresis 1.Yes, 2.No.
muscle stiffness 1.Yes, 2.No.
Alopecia 1.Yes, 2.No.
Obesity 1.Yes, 2.No.
Class 1.Positive, 2.Negative.

Actions taken:

· Conducted analysis to predict whether a person has an early stage diabetes based on the symptoms and signs using regression algorithms
· Performed EDA: data cleaning, data preparation, correlation analysis, data visualization
· Developed and implemented 7 supervised machine learning algorithms that predict early-stage diabetes with more than 90% accuracy.
· Compared each model's prediction performance, resulting in Gradient Boosting Classifier, Decision Tree, and Random Forest being the best models with precision ranging from 97.44% to 98.08%