Here you'll find a solution to selecting and training a classification model to predict the risk for heart disease, putting it into a web service using Flask and then containerizing it using docker.
The risk for heart disease can be associated with health conditions and factors such as lifestyle, age, and family history. A binary classification model based on these variables collected from patients can help in healthcare diagnosis.
A reduced version of the CDC dataset (February 2022 update) containing several features that can be associated with the risk of heart disease is used to build the classification model. Reduced here means that the 300 variables from the original dataset were reduced to 20.
The imbalanced data consists of 17 attributes and one target variable heart_disease
.
The best model was XGBOOST selected from the following:
- Logistic Regression
- Decision Tree
- Random Forest
- XGBoost
First baseline models were obtained without handling the class imbalance, and then new models were built handling class imbalance with class weights. Techniques like oversampling and/or undersampling were not applied here.
Several metrics were calculated but the most useful for this problem were F1, recall, and MCC (Matthews correlation coefficient).
- Create a virtual environment of your choice using (Pipenv,conda,venv) and activate it
- Clone this repository
- Install project dependencies:
pip install -r requirements.txt
- Inside the repository create a new directory called
data
and manually download this Kaggle dataset to it. - Run and inspect Notebook.ipynb
- The best model is saved using pickle. From here you will need the file
web_service.py
. If you want to run the container, run:
$ docker run -it --rm -p 7860:7860 web_service
and access the given local address to use it.
if you want to just run the flask api , run the following command in cli
python web_service.py