The prupose of this project is to provide a stroke prediction service, which could be used by GPs, doctors and health providers to understand the risk of stroke. By picking early indication of risk, health service providers can provide intervention and help patients understand the risk. They can further work with the patients to design a plan or provide a prescription to reduce the risk of stroke
The service will be provided as an API, and the IT teams at Health providers can use it to support web portals and forms. Where the health providers don't have IT to support them, service will be provided as a streamlit app
Amongst the major cardiovascular diseases, stroke is one of the most dangerous and life-threatening disease, but the life of a patient can be saved if the stroke is detected during early stage. The literature reveals that the patients always experience ministrokes which are also known as transient ischemic attacks (TIA) before experiencing the actual attack of the stroke. Most of the literature work is based on the MRI and CT scan images for classifying the cardiovascular diseases including a stroke which is an expensive approach for diagnosis of early strokes. In India where cases of strokes are rising, there is a need to explore noninvasive cheap methods for the diagnosis of early strokes. Sroke prediction
Hence a simple service that can use tabular form of patient data about their health and lifestyle factors, would be cheaper and can scale.
It is believed there are early factors that could help predict stroke. Imagine clinics and GPs have your medical data, and everytime you go to the GP, with the latest data they collect on you, as soon data is enterted in the system, they get notified on risk of suffering a stroke based on current health and lifestyle factors. By equipping clinics to utilise a ML model, can help them to prioritise health of patients that have high risk of suffering stroke, and provide an intervention.
The dataset is available on kaggle at Stroke Prediction Dataset For the ease of discovery, dataset has been downloaded and made available in the repo here
The section covers tools used to run the project
- Python for data exploration with pandas, seaborn and matplotlib
- Python for ML pipeline with sklearn, xgboost and bentoml
- Bentoml framework in python to build the deployment service
- Bash for orchatrating model training, building deployment and pushing to ECR on the cloud
- AWS Fargate for deploying the model as a service on AWS
- Locust for local load testing of bentoml api
- Streamlit for prediction service app
- In the root directory run
pipenv shell
- Then run
python training.py
- Run the following commands in order in bash from the root directory
Note: Make sure
jq
is installed anddocker
is installed and running
Activate the pipenv virtual env shell
pipenv shell
echo "Building the bento"
bentoml build
# Containerise the application
echo "Containerise bento and building it"
export MODEL_TAG=$(bentoml get stroke_detection_classifier:latest -o json | jq -r .version)
cd ~/bentoml/bentos/$SERVICE_NAME/$MODEL_TAG && bentoml containerize $SERVICE_NAME:latest
Note: Make sure jq
is installed and docker
is installed and running, also make sure AWS profile is configured locally which has privelages to create ECR repo and create an image
Activate the pipenv virtual env shell
pipenv shell
Run the following bash script in the root directory
bash ./create_bento_artifacts.sh
This will train the model, build the service, containerize it and take it to ecr repo
The name of the repo will be stroke_detection_classifier:latest
Now we can follow the video 7.6 to deploy it behind fargate
We can then fetch the url provided by the service and update the streamlit app file Navigate to app folder, and then inside the app.py file update the line 31 url
Then run the command in the app directory
pipenv run streamlit run app.py
Once the service is up navigate to localhost:8501
in the browser, change the member id to 1 and click on "Predict Stroke Risk". The app will send the request to AWS Fargate, and a prediction will be returned
- Run the bento prediction service locally from the root directory
pipenv run bentoml serve --production --reload -p 3000
- Then in the other terminal from same root directory run Locust
pipenv run locust -H http://localhost:3000
Navigate to http://0.0.0.0:8089
in the browser and start the load testing
- Open the
exploratory_analysis.ipynb
file andtraining.ipynb
file to see data exploration and model selection strategy - Open the
training.ipynb
andtraining.py
script to see model selection and model building as ML pipelines
Note: Three models were trained, Decision tree, logistic regression and XGBoost, and best one was chosen with highest ROC.