workshop-uncool-mlops

Overview

Completed solution: https://github.com/iterative/workshop-uncool-mlops-solution

Before MLOps

We have a DVC Pipeline defined in dvc.yaml file.

The pipeline is composed of stages using Python scripts, defined in src:

flowchart TD
        node1[compute-data-metrics]
        node2[eval]
        node3[get-data]
        node4[split-data]
        node5[train]
        node3-->node1
        node3-->node4
        node4-->node2
        node4-->node5
        node5-->node2
Loading

We use DVC Params, defined in params.yaml, to configure the pipeline.

The pipeline can be reproduced locally:

Local Reproducibility
git clone git@git@github.com:iterative/workshop-uncool-mlops.git
cd workshop-uncool-mlops
pip install -r requirements.txt
dvc repro

This generates DVC Metrics and DVC Plots to evaluate model performance, which can be found in outs

These files are small enough to be tracked by git, so after we run the pipeline we can share the results with others:

git add `dvc.lock` outs
git push

Viewing Results in Studio

https://studio.iterative.ai/user/daavoo/views/workshop-uncool-mlops-solution-ix8fxl0eob

More info:

https://dvc.org/doc/studio

Towards MLOps

MLOps

You should be able to follow all the steps bellow without leaving the browser.


1. Fork this repo


2. Open fork in online code editor

Navigate to your for fork and press . or change the URL from "github.com" to "github.dev"


3. Setup DVC Remote

DVC remotes provide a location to store arbitrarily large files and directories.

First, you need to create a new folder on our Google Drive, navigate to the folder and copy the last part of the URL.

You can now add a DVC remote to our project:

From web

Update .dvc/config:

https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.dvc/config

From CLI
dvc remote add --default  gdrive://{YOUR_URL}

More info:

https://dvc.org/doc/command-reference/remote/add#description

Other remote?:

https://dvc.org/doc/command-reference/remote/add#supported-storage-types


The results of the pipeline can now be shared with others by using dvc push and dvc pull.

You will be prompted for Google Drive credentials the first time you run dvc push/pull.

Shared Reproducibility
# Researcher A
# Updates hparam
dvc repro
git add . git commit -m "Updated hparam"
git push && dvc push
# Researcher B
git pull && dvc pull
# Receives all changes

4. Reproducibility from anywhere, by anyone

You need to grant GitHub access to the DVC Remote:

From web
From CLI
  • Get the credentials:
cat ".dvc/tmp/gdrive-user-credentials.json"
  • Create a new GitHub Secret: secrets.GDRIVE_CREDENTIALS_DATA

Then, you can create a workflow that runs when a Pull Request is created:

Create and fill `.github/workflows/on_pr.yml` https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.github/workflows/on_pr.yml

And now you can reproduce the pipeline from the web:

From GitHub
  • Edit params.yaml from the GitHub Interface.

  • Change train.epochs.

  • Select Create a new branch for this commit and start a pull request

iterative/workshop-uncool-mlops-solution#10

From Studio

iterative/workshop-uncool-mlops-solution#8

Lost

More compute?:

https://cml.dev/doc/self-hosted-runners

--

5. Automation

You can also create a workflow that runs on a daily schedule:

Create and fill `.github/workflows/daily.yml` https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.github/workflows/daily.yml

Operation vacation

6. Deployment

For deployment you can create a workflow that builds and deploys a docker images.

Create and fill `Dockerfile` https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/Dockerfile
Create and fill `.github/workflows/deploy_model.yml` https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.github/workflows/deploy_model.yml

You can use the published image from anywhere:

Create and fill `.github/workflows/issue_labeler.yml`

https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.github/workflows/issue_labeler.yml

See predictions on a new created issue:

https://github.com/iterative/workshop-uncool-mlops-solution/runs/5338934537?check_suite_focus=true#step:2:50

Or use from anywhere:

docker run "ghcr.io/iterative/workshop-uncool-mlops-solution:main" "dvc pull fails when using my S3 remote"