Install

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run pipelines

Run pipeline_a_segment projects

dvc exp run -R pipeline_a_segment/x
dvc exp run -R pipeline_a_segment/y
dvc exp run -R pipeline_a_segment/z

Run pipeline_b_detect projects

dvc exp run -R pipeline_b_detect/i
dvc exp run -R pipeline_b_detect/j

Run a pipeline (target/data/customer)

run.sh script copies dvc.yaml template for a target division, and then runs a DVC pipeline.

./run.sh PROJECT TARGET 

Arguments:

  • PROJECT - Path to a project/pipeline/model directory with a common template_dvc.yaml
  • TARGET - Name of the target/data/customer to apply DVC pipeline to

Examples

 # Run a segmentation pipeline for customer `x`
./run.sh pipeline_a_segment x

Run multiple pipelines (list of targets)

Parse list of targets and run DVC pipeline for each of them

./run_targets.sh PROJECT TARGETS 

Arguments:

  • PROJECT - Path to a project/pipeline/model directory with a common template_dvc.yaml
  • TARGETS - Comma separated list of targets (no spaces in between)

Examples

 # Run a detection pipeline for each target
./run_targets.sh pipeline_a_segment x,y,z
./run_targets.sh pipeline_b_detect i,j

Cloud Versioning Workflows

1 - Setup Remote Storages

Add local remote

mkdir /tmp/monorepo-reusable-pipelines
dvc remote add --local -d local /tmp/monorepo-reusable-pipelines

Add remote-i remote

dvc remote add remote-i s3://cse-cloud-version/monorepo-reusable-pipelines/remote-i/ 
dvc remote modify remote-i version_aware true

Add remote-j remote

dvc remote add remote-j s3://cse-cloud-version/monorepo-reusable-pipelines/pipeline_b_detect/j/ 
dvc remote modify remote-j version_aware true

2 - Use multiple Remote Storages

Notes:

  • In dvc.yaml, you can set a remote: field for the outputs to control which remote they use

Example

    outs:
      - pipeline_b_detect/i/results/metrics.json:
          remote: remote-i

2.1 - Run & persist pipeline_b_detect/i project

dvc exp run -R pipeline_b_detect/i
dvc push -r remote-i
git add . && git cm "New experiment - saved"

2.2 Run & persist pipeline_b_detect/j project

dvc exp run -R pipeline_b_detect/j
dvc push -r remote-j
git add . && git cm "New experiment j - saved"

Expected Results

  • pipeline_b_detect/i/dvc.lock has only remote-i specified for outs
  • pipeline_b_detect/j/dvc.lock has only remote-j specified for outs

Changes

Oct 16, 2023 - Add metrics tracking

  • metrics saved with DVCLive (in ml/src/train.py script)
  • for project pipeline_a_segment - metrics saved in PROJECT/dvclive/, metrics/plot files automatically added to the root dvc.yaml, version with Git
  • for project pipeline_b_detect - metrics saved in PROJECT/results/, metrics/plot files specified in dvc.yaml (as outs in the train stage), version with DVC (not Git)
  • in both cases DVC updates metrics/plots in Studio/VSCode in real time

Oct 17, 2023 - 2 - Use multiple Remote Storages

  • Projects pipeline_b_detect use different Remote Storages (version_aware=True)