Hey there! 👋 Welcome to my machine learning project, where I’m working on something super important—predicting in-hospital mortality. The whole idea is to create a reproducible pipeline, so whether you’re running it on your laptop or in the cloud, you can easily follow along and get the same results. ✨
I’ve pulled together medical data and built an end to end machine learning project. To make it all happen, I’m using tools like MLflow, Mage, Docker, Google Cloud, Evidently AI, Grafana etc.
The goal? Make sure everything is running smoothly, scalable, and—most importantly—reproducible. Ultimately, it’s about giving healthcare professionals the insights they need to help their patients in the best possible way.
Hospital mortality are pressing challenges in healthcare today. The ability to predict in-hospital mortality can be crucial for better patient management and improving outcomes.
The objective of this project is to develop a machine learning model that can accurately predict the likelihood of a patient’s mortality during their hospital stay. The goal is to provide healthcare professionals with valuable insights that can potentially save lives.
The dataset I used for this project comes from the Kaggle In-Hospital Mortality Prediction dataset, which provides a comprehensive collection of ICU patient data, including physiological measurements and clinical information.
- Number of records: 1,178 ICU stays
- Original number of features: 51 columns
- Reduced number of features: 24 columns
To simplify the task and focus on the most relevant clinical features, I reduced the dataset from its original 51 columns to 24 columns, stored in the data/processed_data.csv
file. This allows for a more efficient model-building process while maintaining accuracy.
For more detailed information about the dataset, including explanations of the columns, please refer to the data/README.md
.
- Cloud Platform: Google Cloud Platform (GCP)
- Container Registry: Google Container Registry (GCR)
- Containerization: Docker and Docker Compose
- Orchestration: Google Cloud Run
- Continuous Integration/Continuous Deployment (CI/CD): GitHub Actions
- Pipeline Development: Mage (local pipeline orchestration)
- Experiment Tracking and Model Management: MLflow (local experiment tracking and model management)
- Model Artifacts Storage: Google Cloud Storage
- Model Monitoring: Evidently AI and Grafana
- Language: Python
- Web Application Framework (Local Deployment): Flask (for local web deployment)
- Web Application Framework (Cloud Deployment): Streamlit (for cloud-based web deployment)
Warning: Following the steps of what's in here may cost you money (Google Cloud is a paid service), be sure to shut down any Google Cloud service you no longer need to use to avoid charges.
You can check out the live version of the Streamlit app at the following URL:
The exploratory data analysis and modeling are done in the notebooks directory. The exploratory data analysis and baseline model building are done in the analysis.ipynb
notebook. The experiment.ipynb
is where I practiced the experiment tracking before I integrated it with the mage pipeline. The notebook directory also contains a model directory called model
, where the baseline model is stored.
Warning (again): Using Google Cloud services costs money. If you don't have credits (you get $300USD when you first sign up), you will be charged. Delete and shutdown your work when finished to avoid charges.
Before setting up the project, ensure you have completed the following prerequisites. Detailed, step-by-step instructions for each prerequisite can be found in their respective files located in the setup
folder.
-
Google Cloud Platform (GCP) Account: An active GCP account to use Google Cloud services. See the detailed guide for instructions.
-
Docker: Install Docker and Docker Compose for containerization and local development. See the detailed guide for instructions.
-
Python: Make sure you have Python installed (Python 3.12 recommended). See the detailed guide for instructions.
-
Google Cloud SDK: Install Google Cloud SDK for managing Google Cloud resources. See the detailed guide for instructions.
-
GitHub Account: For CI/CD pipelines using GitHub Actions. See the detailed guide for instructions.
git clone https://github.com/Tobai24/In-Hospital-Mortality-Prediction.git
cd In-Hospital-Mortality-Prediction
Option 1: Using pipenv
(Recommended)
-
Install pipenv
pip install pipenv
-
Install the dependencies with
pipenv
:pipenv install
-
Activate the
pipenv
shell:pipenv shell
Option 2: Using requirements.txt
(For users preferring pip
)
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install the dependencies:
pip install -r requirements.txt
- If you use
pipenv
, you do not need to install therequirements.txt
dependencies manually becausepipenv
reads thePipfile
and manages the environment for you.
Ready to dive into your project? Here’s a quick guide to get you started. Each step is broken down into folders, so you’ll find all the info you need right where you need it.
- Tools Used: Mage for pipelines and MLflow for experiment tracking.
- Where to Find It: Check out the workflow folder in your project directory. The README in this directory will walk you through setting up Mage and MLflow for this project.
- Tools Used: Flask for building your web app and Docker for containerizing it.
- Where to Find It: Head over to the deployment/local_deployment folder. This README covers everything you need to get your app running locally. It’s got the details for setting up Flask and Docker, so you can test things out on your own machine.
- Tools Used: Google Cloud Platform (GCP) for hosting your app and Streamlit for the web interface.
- Where to Find It: Navigate to the deployment/web_deployment folder. The README guides you through deploying your app to GCP using Streamlit. It’s perfect for getting your app live on the cloud.
It also include steps to for continuous deployment with gitHub Actions
- Tools Used: Grafana for creating dashboards and Evidently AI for monitoring your models.
- Where to Find It: You’ll find this info in the monitoring folder. This README will show you how to set up Grafana and Evidently AI to keep an eye on your app’s performance and metrics.
Feel free to explore these folders and read through the instructions. Each README is packed with the details you need to get everything up and running smoothly. Enjoy setting up your project! 🚀
A huge thank you to DataTalks.Club for offering their MLOps course for free! 🌟 The insights and skills I gained from their course were crucial in developing this project. If you want to enhance your expertise in MLOps, I highly recommend checking out their course. It’s an invaluable resource for diving into MLOps technologies! 📚✨