DVC - demo 1
MAKE SURE VERSION 3.50 HAS BEEN INSTALLED
2 cd .\DVC\first-demo\
3 git init 4 dvc init 5 git status 6 git commit -m "initializing dvc" 9 dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
mkdir C:\SOFTWARES\dvcstore
11 dvc remote add -d myremote C:\SOFTWARES\dvcstore 12 dvc push 13 dvc pull 14 dvc add data/data.xml 15 dvc push 16 dvc pull
UPDATE EXISTING FILE AND DO VERSIONING
21 copy data/data.xml c:\temp\data.xml 22 type c:\temp\data.xml >> data/data.xml 23 dvc add data/data.xml 24 dvc push 28 git add data/data.xml.dvc 29 git commit -m "Dataset updates" 30 dvc pull 31 git log 33 git checkout HEAD~1 35 git log 36 dvc checkout
UNDER "DVC" FOLDER
git clone https://github.com/vishymails/code.git
RENAME "CODE" FOLDER TO "pipeline-demo"
19 git init 20 dvc init
dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
21 dvc add data/data.xml
28 pip install -r src/requirements.txt
31 git add . 32 git status 33 git commit -m "initial commit"
36 dvc stage add -n prepare -p prepare.seed,prepare.split -d src/prepare.py -d data/data.xml -o data/prepared python src/prepare.py data/data.xml
37 dvc stage add -n featurize -p featurize.max_features,deaturize.ngrams -d src/featurization.py -d data/prepared -o data/features python src/featurization.py data/prepared data/features 38 dvc stage add -n train -p train.seed.train.n_est,train.min_split -d src/train.py -d data/features -o model.pkl python src/train.py data/features model.pkl 39 git add . 40 git commit -m "pipeline defined" 41
45 dvc repro
IF YOU FACE ERRORS CHECK prepare.py
line 59 add encoding parameter
with open(input, encoding="utf-8") as fd_in:
input_lines = fd_in.readlines()
dvc repro
MODIFY PIPELINE BY ADDING NEW STAGE
28 dvc repro 29 dvc stage add -n evaluate -d src/evaluate.py -d model.pkl -d data/features -o eval python src/evaluate.py model.pkl data/features 30 dvc dag 31 dvc repro 32 git add . 33 git status 34 git commit -a -m "create eval stage"
GENERATED DATA CAN BE VIEWED IN METRICS EDITOR
35 dvc metrics show 36 dvc plots show 37 history
MAKE CHANGES AND SEE THE DIFFERENCE
PARAMS.YAML
-----------
prepare:
split: 0.20 seed: 20170428
featurize: #max_features: 100 #ngrams: 1 max_features: 200 ngrams: 2
train: seed: 20170428 n_est: 50 min_split: 0.01
TRY THESE COMMANDS AFTER SAVING PARAMS.YAML FILE
38 dvc repro
39 dvc params diff 40 dvc metrics diff 41 dvc plots diff 42 history