A popular dataset on Kaggle is the Heart Disease UCI dataset.
This notebook explores the dataset from a number of approaches, including initial data assessment and cleaning, uni/bi/multivariate exploration, predictive modeling, bayes rule, and an in-depth visualization progression.
- Python
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
- The sample used for the research appears to have been generated using matching for age and sex while having heart disease
- The individual numerical features that are the strongest predictors are
maximum heart rate achieved
and a feature calledold peak
- These two features had the ability to predict heart disease with an accuracy of almost 70%
- This is not sufficient to produce a strong test based on Bayes Rule for increased change of having heart disease based on the test results
- More findings to come!