This repository can be used for easily setting up a data science or machine learning project with automated training and deployment using GitHub Actions, DagsHub and AWS EC2.
The following concepts are automatically performed using Github Actions and DagsHub.
- Automatically pull and process data from Github using Pandas and Python.
- Train your model and track experiment with MLFlow and SKlearn.
- Validate the model and save the serialized model and metadata to DVC on DagsHub.
- Deploy your model to AWS EC2 instance.
- Monitor the metrics with MLFlow.
- Retrain if necessary.
The following prerequisites are required to make this repository work:
- AWS subscription
- Access to DagsHub
- Access to GitHub Actions
- Python 3.9.1
- DVC 2.11
- You can find all the additional information in the
requirements.txt
file
GitHub Actions contains five main components as shown below.
DagsHub provides the capabilities to use MLFlow and DVC while giving the choice of working on Github. The following results are the experiments from DagsHub, using MLFlow to track the model F1-Score
, Precision
and Recall
.
The following animation corresponds to the execution of the pipeline using Github Actions from Data Extraction to Model Training.
This is the final result after the model is deployed into productiion.
- Input:
Using DagsHub and Github Actions is so cool
.
- Response:
Ham
meaning that the previous message is not a spam :)
Read the full article on my medium and Follow me for more content.