Predicting Heart Disease using Machine Learning

This repository contains a Jupyter Notebook that serves as an end-to-end example of a data science and machine learning proof of concept for heart disease classification.

Problem Definition

The goal is to predict whether a patient has heart disease based on clinical parameters.

Data

The dataset used in this project is sourced from the Cleveland database from UCI Machine Learning Repository but has been downloaded in a formatted way from Kaggle. It contains 14 attributes that will be used for prediction.

Evaluation

The initial evaluation metric is set to achieve 95% accuracy in predicting heart disease during the proof of concept.

Features

The features used for prediction include age, sex, chest pain type, resting blood pressure, serum cholesterol level, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise relative to rest, slope of the peak exercise ST segment, number of major vessels colored by fluoroscopy, and thalium stress result.

Tools and Libraries

The following libraries are utilized for data analysis, visualization, and machine learning tasks:

Model Choices and Hyperparameter Tuning

We conducted hyperparameter tuning using techniques such as RandomizedSearchCV and GridSearchCV to optimize the performance of our models. The process involved adjusting the settings of each algorithm to find the best combination of hyperparameters. Here's a summary of our findings:

Logistic Regression:
- Test accuracy: 88.52%
Random Forest:
- Test accuracy: 86.89%

Project Status

This project is still under development and will be updated regularly. Stay tuned for further updates and improvements.

JonathaWRDCosta/Heart-Disease-ML-Project