/HeartDiseasePrediction

Heart disease is a major global health concern that affects millions of people around the world. Early detection and accurate prediction of heart disease can help to prevent the progression of the disease and save lives. In this project, we aim to develop a predictive model for heart disease using various machine learning algorithms.

Primary LanguageJupyter Notebook

Heart Disease Prediction

In this model we have predicted the risk of heart disease based on realtime dataset that we have got from a medical university chennai.

DATASET

The dataset we used contians more than 150 patient records and their daily routine data. As a result it helps our model to train with realtime data.Our data set contains many features like age,Dietary habitat,Alcohol consumer or not etc..

Dataset: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

Data Preprocessing

We have preprocessed our dataset using Sk-learn.

sk-learn

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.

Models Used:

  1. SVM
  2. Naive Bayes
  3. Logistic Regression
  4. Decision Tree
  5. Random Forest
  6. LightGBM
  7. XGboost

Model's Result Prediction Percentage:

SVM:

Support Vector Machine

    ~Training Set Prediction : 0.6694214876033058
    
    ~Testing Set Prediction : 0.5737704918032787

Naive Bayes:

Works based on Naive Bayes algorithm

    ~Training Set Prediction : 0.8677685950413223
    
    ~Testing Set Prediction :  0.7868852459016393

Logistic Regression:

Logistic regression estimates the probability of an disease  based on a given dataset of independent variables

    ~Training Set Prediction : 0.8636363636363636
    
    ~Testing Set Prediction : 0.8032786885245902

Decision Tree:

A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks.

    ~Training Set Prediction : 1.0     
    
    ~Testing Set Prediction :  0.7704918032786885 

Random Forest:

Works based on Naive Bayes algorithm

    ~Training Set Prediction : 1.0
    
    ~Testing Set Prediction : 0.7704918032786885

LightGBM:

 Gradient Boosting Decision Tree (GBDT) algorithm with the addition of two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).

    ~Training Set Prediction : 0.9958677685950413
    
    ~Testing Set Prediction :0.7704918032786885

XGBoost:

 It builds a decision tree for a given boosting iteration, one level at a time, processing the entire dataset concurrently on the GPU.

    ~Training Set Prediction : 0.987603305785124
    
    ~Testing Set Prediction : 0.7540983606557377      

CONCLUDED MODEL FROM PREDICTION:

  From the prediction results **Logistic Regression** is more suitable for both training and testing data .