This project is done in partial completion of the module SC1015 Introduction to Data Science & Artificial Intelligence.
This is done by SC8 Team 04 which consists of:
- Choo Jin Cheng (U2121190C)
- Chua Min Min (U2121126G)
- Poh Shi Qian (U2122452J)
Date completed: 24 April 2022
Below is just a summary of our project. For more information, please read mini_project.ipynb.
Stroke can often be caused by unhealthy lifestyle and other health problems. Are there any unconventional causes?
According to the World Stroke Organisation (n.d.), stroke is a "leading cause of death and disability globally". In 2019 alone, there were 6.6 million people who died stroke of varying severity (American Heart Association, 2021).
While age and chronic health conditions like heart diseases are commonly known to increase the chances of a person getting a stroke, there might be unconventional factors leading to a healthy person getting a stroke. Hence, this project aims to uncover, if any, correlations between unconventional factors like marital status and a person's chance of getting a stroke.
Do unconventional features help to better predict whether a person will have / already has a stroke?
This is a Classification problem. Our goal is to find out if there is any unconventional feature that makes one more likely to get a stroke.
This dataset is extracted from Kaggle. It has the following fields:
- id: unique identifier
- gender: "Male", "Female" or "Other"
- age: age of the patient
- hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension
- heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
- ever_married: "No" or "Yes"
- work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
- Residence_type: "Rural" or "Urban"
- avg_glucose_level: average glucose level in blood
- bmi: body mass index
- smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
- stroke: 1 if the patient had a stroke or 0 if not
*Note: "Unknown" in smoking_status means that the information is unavailable for this patient
- ever_married
- work_type
- Residence_type
- Exploratory Data Analysis on the features
- Plotting of graphs
- Statistical summaries
- Simple calculations
- Correlation checks
- Data Cleaning and Preparation
- Removal of rows/columns
- Replacement of values
- Encoding (Label/One-Hot)
- Post-cleaning Work
- Correlation checks
- Feature Selection (SelectKBest)
- Machine Learning
- k-Nearest Neighbors
- XGBoost
- Artificial Neural Network
- Naive Bayes
- Conclusion
- Chi-square test - this is for categorical features correlation check
- One-Hot encoding - this is for categorical features that are non-binary
- SelectKBest feature selection - this is to provide insights on variable importance
- Synthetic Minority Over-sampling Technique (SMOTE) - this is to compensate for our heavily imbalanced data
- k-Nearest Neighbors - model
- XGBoost - model
- Artificial Neural Network - model
- Naive Bayes - model
- Naive Bayes is the most ideal model for this dataset
- Unconventional features can help to better predict if a person will have / already has a stroke
- 'work_type' is the most significant unconventional feature, followed by 'ever_married' and 'Residence_type'
- American Heart Association (2021). 2021 Heart Disease & Stroke Statistical Update Fact Sheet Global Burden of Disease. Professional Heart Daily. https://professional.heart.org/-/media/PHD-Files-2/Science-News/2/2021-Heart-and-Stroke-Stat-Update/2021_Stat_Update_factsheet_Global_Burden_of_Disease.pdf
- Bariatric Department at Lafayette General Medical Center (2019). How Obesity Affects Stroke Risk. Ochsner Lafayette General. https://ochsnerlg.org/about-us/news/how-obesity-affects-stroke-risk
- Huang, Y., Xu, S., Hua, J., Zhu, D., Liu, C., Hu, Y., Liu, T. & Xu, D. (2015). Association between job strain and risk of incident stroke: A meta-analysis. Neurology, 85(19), 1648-1654. https://doi.org/10.1212/WNL.0000000000002098
- WebMD (2021). Top 10 Causes of Strokes - Risk Factors and How You Can Lower Your Risks. WebMD. https://www.webmd.com/stroke/guide/stroke-causes-risks
- World Stroke Organization (n.d.). Learn about stroke. World Stroke Orgnization. https://www.world-stroke.org/world-stroke-day-campaign/why-stroke-matters/learn-about-stroke
- Wyller, T. B. (1999). Stroke and gender. The journal of gender-specific medicine : JGSM : the official journal of the Partnership for Women's Health at Columbia, 2(3), 41–45.