Heart Attack Prediction Using Machine Learning

Overview

This repository contains a machine learning project aimed at predicting the likelihood of a heart attack based on a set of medical attributes. The dataset includes various patient features such as age, cholesterol levels, and exercise-induced angina, among others. The goal of this project is to develop a robust predictive model that can assist in early diagnosis and prevention of heart-related conditions.

Overview
Dataset
Features
Exploratory Data Analysis (EDA)
Data Preprocessing
Model Building
Model Evaluation
Hyperparameter Tuning
Results
Installation
Usage

Features

The dataset includes the following features:

Age: Age of the patient (years).
Sex: Gender of the patient (1 = Male, 0 = Female).
cp (Chest Pain Type):
- 0: Typical angina
- 1: Atypical angina
- 2: Non-anginal pain
- 3: Asymptomatic
trestbps: Resting blood pressure (in mm Hg).
chol: Serum cholesterol in mg/dl.
fbs: Fasting blood sugar > 120 mg/dl (1 = True; 0 = False).
restecg: Resting electrocardiographic results.
thalach: Maximum heart rate achieved.
exang: Exercise-induced angina (1 = Yes; 0 = No).
oldpeak: ST depression induced by exercise relative to rest.
slope: Slope of the peak exercise ST segment.
ca: Number of major vessels (0-3) colored by fluoroscopy.
thal: Thalassemia (0 = Normal; 1 = Fixed defect; 2 = Reversible defect).
target: Heart attack occurrence (1 = Yes, 0 = No).

Exploratory Data Analysis (EDA)

Extensive EDA was performed to understand the data distribution, identify correlations, and uncover hidden patterns. Key steps included:

Univariate Analysis: Histograms, box plots, and density plots were created to inspect the distribution of individual features.
Bivariate Analysis: Pair plots and correlation heatmaps were used to explore relationships between features and the target variable.
Outlier Detection: Outliers were identified and analyzed using statistical methods.

Data Preprocessing

To ensure the data was suitable for model building, the following preprocessing steps were taken:

Handling Missing Values: No missing values were found in the dataset.
Feature Scaling: Continuous features were standardized using z-score normalization.
Encoding Categorical Variables: Categorical features were converted into numerical values using one-hot encoding.

Model Building

Multiple machine learning models were tested to find the best-performing one. The models included:

Logistic Regression
Random Forest
XGBoost
Neural Networks (Keras)

Model Architecture for Neural Networks:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=10)

Model Evaluation

The models were evaluated based on the following metrics:

Accuracy
Precision
Recall
F1-Score
ROC-AUC Score

Results:

Training Accuracy: 99.59%
Test Accuracy: 83.61%
ROC-AUC Score: 0.91 (for the best model)

Hyperparameter Tuning

Hyperparameter tuning was performed using GridSearchCV and RandomizedSearchCV to optimize the model's performance.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

Results

The final model demonstrates strong predictive capability with the following key results:

Best Model: XGBoost Classifier
Training Accuracy: 99.59%
Test Accuracy: 83.61%
Key Insights:
- Higher cholesterol levels and exercise-induced angina are significant predictors of heart attacks.
- The model's predictions are reliable with a high ROC-AUC score, indicating a strong ability to distinguish between patients with and without heart attacks.

Installation

Clone the repository and install the required dependencies:

git clone https://github.com/TravelXML/ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS.git
cd ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS
pip install -r requirements.txt

Usage

To run the model and reproduce the results, follow these steps:

Prepare the Data: Ensure the dataset is available in the correct directory.
Run the Jupyter Notebook: Open the provided Jupyter notebook and execute the cells.
Model Evaluation: Evaluate the model's performance on your data.

jupyter notebook heart-attack-eda-prediction.ipynb

Happy Coding!

TravelXML/ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS