/Heart-failure-prediction

As part of my work for the Big Data and Official Statistics course, I surpassed the performance of the paper authors by 3% on Accuracy and 5.3% on F1.

MIT LicenseMIT

Heart-failure-prediction

As part of my work for the Big Data and Official Statistics course, I surpassed the performance of the paper published by the authors by 3% on Accuracy and 5.3% on F1 just by doing Features Engineering!

All technivcal work is available at the following Google Colab.

Feel free to leave your comments/feedback in Colab.

Introduction

image

Background

  • Cardiovascular diseases kill approximately 17 million people globally every year, and they mainly exhibit as myocardial infarctions and heart failures.
  • Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body. Available electronic medical records of patients quantify symptoms, body features, and clinical laboratory test values, which can be used to perform biostatistics analysis aimed at highlighting patterns and correlations otherwise undetectable by medical doctors.
  • Machine learning, in particular, can predict patients’ survival from their data and can individuate the most important features among those included in their medical records.

Methods

  • In this project and paper, I analyzed a dataset of 299 patients with heart failure.
  • I applied several machine learning classifiers (6) to both predict the patients survival, and rank the features corresponding to the most important risk factors.
  • Since both feature ranking approaches clearly identify that the time attribute (and other two features that I engineered from the time attribute), as well as serum creatinine and ejection fraction attributes as the most relevant features, I then built the machine learning survival prediction models on these factors alone.

Results

The results of these five-feature models show

  • not only that time, serum creatinine and ejection fraction are sufficient to predict survival of heart failure patients from medical records,
  • but also that using these features alone can lead to more accurate predictions than using the original dataset features in its entirety.

Analysis

1.1. Dataset: Overview

image

1.2. Dataset: Feature Engineering

image

2. Methods

image

3. Features Ranking

image

4. Metrics

image

Conclusions

  • This discovery has the potential to impact on clinical practice, becoming a new supporting tool for physicians when predicting if a heart failure patient will survive or not.
  • Indeed, medical doctors aiming at understanding if a patient will survive after heart failure may focus mainly on serum creatinine and ejection fraction.
  • As a limitation of the present study, we have to report the small size of the dataset (299 patients): a larger dataset would have permitted to obtain more reliable results.
  • Additional information about the physical features of the patients (height, weight, body mass index, etc.) and their occupational history would have been useful to detect additional risk factors for cardiovascular health diseases.

References

  • Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone – Davide Chicco, Giuseppe Jurman.
  • Treatment of heart failure with preserved ejection fraction. reflections on its treatment with an aldosterone antagonist – Pfeffer MA, Braunwald E.
  • Machine learning for prediction of sudden cardiac death in heart failure patients with low left ventricular ejection fraction: study protocol for a retroprospective multicentre registry in China – Meng F, Zhang Z, Hou X.
  • Targets for heart failure with preserved ejection fraction – Nanayakkara S, Kaye DM.