python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
dvc exp run -R pipeline_a_segment/x
dvc exp run -R pipeline_a_segment/y
dvc exp run -R pipeline_a_segment/z
dvc exp run -R pipeline_b_detect/i
dvc exp run -R pipeline_b_detect/j
run.sh
script copies dvc.yaml
template for a target division, and then runs a DVC pipeline.
./run.sh PROJECT TARGET
Arguments:
- PROJECT - Path to a project/pipeline/model directory with a common
template_dvc.yaml
- TARGET - Name of the target/data/customer to apply DVC pipeline to
Examples
# Run a segmentation pipeline for customer `x`
./run.sh pipeline_a_segment x
Parse list of targets and run DVC pipeline for each of them
./run_targets.sh PROJECT TARGETS
Arguments:
- PROJECT - Path to a project/pipeline/model directory with a common
template_dvc.yaml
- TARGETS - Comma separated list of targets (no spaces in between)
Examples
# Run a detection pipeline for each target
./run_targets.sh pipeline_a_segment x,y,z
./run_targets.sh pipeline_b_detect i,j
Add local
remote
mkdir /tmp/monorepo-reusable-pipelines
dvc remote add --local -d local /tmp/monorepo-reusable-pipelines
Add remote-i
remote
dvc remote add remote-i s3://cse-cloud-version/monorepo-reusable-pipelines/remote-i/
dvc remote modify remote-i version_aware true
Add remote-j
remote
dvc remote add remote-j s3://cse-cloud-version/monorepo-reusable-pipelines/pipeline_b_detect/j/
dvc remote modify remote-j version_aware true
Notes:
- In
dvc.yaml
, you can set aremote:
field for the outputs to control which remote they use
Example
outs:
- pipeline_b_detect/i/results/metrics.json:
remote: remote-i
2.1 - Run & persist pipeline_b_detect/i
project
dvc exp run -R pipeline_b_detect/i
dvc push -r remote-i
git add . && git cm "New experiment - saved"
2.2 Run & persist pipeline_b_detect/j
project
dvc exp run -R pipeline_b_detect/j
dvc push -r remote-j
git add . && git cm "New experiment j - saved"
Expected Results
pipeline_b_detect/i/dvc.lock
has onlyremote-i
specified for outspipeline_b_detect/j/dvc.lock
has onlyremote-j
specified for outs
Oct 16, 2023 - Add metrics tracking
- metrics saved with DVCLive (in
ml/src/train.py
script) - for project
pipeline_a_segment
- metrics saved inPROJECT/dvclive/
, metrics/plot files automatically added to the rootdvc.yaml
, version with Git - for project
pipeline_b_detect
- metrics saved inPROJECT/results/
, metrics/plot files specified indvc.yaml
(as outs in thetrain
stage), version with DVC (not Git) - in both cases DVC updates metrics/plots in Studio/VSCode in real time
Oct 17, 2023 - 2 - Use multiple Remote Storages
- Projects
pipeline_b_detect
use different Remote Storages (version_aware=True
)