DVC + Github Actions

Note: this is a very experimental project, if you have any information and tips that can improve this, feel free to reach out and open an issue.

When creating machine learning projects we usually stumble upon a few very challenging problems:

Data and Model Versioning
Managing Experiments.
Model Deployment
Validation and Monitoring

This project is my attempt to solve at least the first three problems listed above.

Data and Model Versioning

This is done by DVC (Data Version Control). The steps in which I used to create this DVC pipeline can be found on create_step.sh

Managing Experiments

DVC also takes care of the experiments. Which at least in my book is a huge win (one less system to learn and manage). The way I usually approach this problem is creating several branches each modifying a few lines and parameters and running dvc repro. Then comparing the results of that branch with other branches to see which will be promoted to main.

Model deployment

The model is evaluated agains the metrics saved on metrics.json. If it detects an improvement it should save the model into serving, using that as a trigger to deployment

ClaudioDavi/cd4ml

DVC + Github Actions

Data and Model Versioning

Managing Experiments

Model deployment