/ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS

A machine learning project focused on predicting heart attacks using models like Logistic Regression, Random Forest, and XGBoost, achieving an 83.61% test accuracy. Includes comprehensive EDA, feature engineering, and hyperparameter tuning.

Primary LanguageJupyter Notebook

Heart Attack Prediction Using Machine Learning

Heart Attack Prediction

Overview

This repository contains a machine learning project aimed at predicting the likelihood of a heart attack based on a set of medical attributes. The dataset includes various patient features such as age, cholesterol levels, and exercise-induced angina, among others. The goal of this project is to develop a robust predictive model that can assist in early diagnosis and prevention of heart-related conditions.

Table of Contents

Features

The dataset includes the following features:

  • Age: Age of the patient (years).
  • Sex: Gender of the patient (1 = Male, 0 = Female).
  • cp (Chest Pain Type):
    • 0: Typical angina
    • 1: Atypical angina
    • 2: Non-anginal pain
    • 3: Asymptomatic
  • trestbps: Resting blood pressure (in mm Hg).
  • chol: Serum cholesterol in mg/dl.
  • fbs: Fasting blood sugar > 120 mg/dl (1 = True; 0 = False).
  • restecg: Resting electrocardiographic results.
  • thalach: Maximum heart rate achieved.
  • exang: Exercise-induced angina (1 = Yes; 0 = No).
  • oldpeak: ST depression induced by exercise relative to rest.
  • slope: Slope of the peak exercise ST segment.
  • ca: Number of major vessels (0-3) colored by fluoroscopy.
  • thal: Thalassemia (0 = Normal; 1 = Fixed defect; 2 = Reversible defect).
  • target: Heart attack occurrence (1 = Yes, 0 = No).

Exploratory Data Analysis (EDA)

Extensive EDA was performed to understand the data distribution, identify correlations, and uncover hidden patterns. Key steps included:

  • Univariate Analysis: Histograms, box plots, and density plots were created to inspect the distribution of individual features.
  • Bivariate Analysis: Pair plots and correlation heatmaps were used to explore relationships between features and the target variable.
  • Outlier Detection: Outliers were identified and analyzed using statistical methods.

Data Preprocessing

To ensure the data was suitable for model building, the following preprocessing steps were taken:

  • Handling Missing Values: No missing values were found in the dataset.
  • Feature Scaling: Continuous features were standardized using z-score normalization.
  • Encoding Categorical Variables: Categorical features were converted into numerical values using one-hot encoding.

Model Building

Multiple machine learning models were tested to find the best-performing one. The models included:

  • Logistic Regression
  • Random Forest
  • XGBoost
  • Neural Networks (Keras)

Model Architecture for Neural Networks:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=10)

Model Evaluation

The models were evaluated based on the following metrics:

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • ROC-AUC Score

Results:

  • Training Accuracy: 99.59%
  • Test Accuracy: 83.61%
  • ROC-AUC Score: 0.91 (for the best model)

Hyperparameter Tuning

Hyperparameter tuning was performed using GridSearchCV and RandomizedSearchCV to optimize the model's performance.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

Results

The final model demonstrates strong predictive capability with the following key results:

  • Best Model: XGBoost Classifier
  • Training Accuracy: 99.59%
  • Test Accuracy: 83.61%
  • Key Insights:
    • Higher cholesterol levels and exercise-induced angina are significant predictors of heart attacks.
    • The model's predictions are reliable with a high ROC-AUC score, indicating a strong ability to distinguish between patients with and without heart attacks.

Installation

Clone the repository and install the required dependencies:

git clone https://github.com/TravelXML/ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS.git
cd ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS
pip install -r requirements.txt

Usage

To run the model and reproduce the results, follow these steps:

  1. Prepare the Data: Ensure the dataset is available in the correct directory.
  2. Run the Jupyter Notebook: Open the provided Jupyter notebook and execute the cells.
  3. Model Evaluation: Evaluate the model's performance on your data.
jupyter notebook heart-attack-eda-prediction.ipynb

Happy Coding!