data version control tutorial
-
first, you should make a git repository. and then clone it to your local computer
-
install
pip install dvc
- initialize dvc project
dvc init
- add remote repository
pip install dvc_gdrive
dvc remote list
dvc remote add -d mygdrive gdrive://<folderID>
- add first version of data
dvc get https://github.com/iterative/dataset-registry tutorials/versioning/data.zip
unzip data.zip & rm -f data.zip
dvc add data/
dvc push
git commit -m "ADD first version of data/"
git tag -a "v1.0" -m "data v1.0 1000 images"
git push
- add new version of data
dvc get https://github.com/iterative/dataset-registry tutorials/versioning/new-labels.zip
unzip new-labels.zip & rm -f new-labels.zip
dvc add data/
dvc diff
git diff data.dvc
git commit -am "New version of data/ with more training images"
git tag -a "v2.0" -m "data v2.0, 2000 images"
dvc push