This repository contains the code for building and evaluating several machine-learning models to predict the likelihood of a stroke based on health data from patients. The dataset used includes various medical attributes such as gender, age, hypertension status, heart disease status, and more.
Kaggle Link: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
The dataset includes the following attributes:
- Gender
- Age
- Hypertension
- Heart disease
- Ever married
- Work type
- Residence type
- Average glucose level
- BMI
- Smoking status
- Stroke (Target Variable)
To set up the project environment:
- Clone the repository: https://github.com/jayasurya247/Stroke_Prediction.git
- Run the Python script or Jupyter Notebook.
- Random Forest Classifier
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree Classifier
- XGBoost Classifier
The models were evaluated based on their accuracy, precision, recall, and F1-score using both original and SMOTE (Synthetic Minority Over-sampling Technique) enhanced datasets to handle class imbalance.
The models' performance can be found in the notebooks, showcasing detailed classification reports and accuracy comparisons.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to fork this project and contribute to improving the stroke prediction models.