Benchmarking dvc with pytest-benchmark.
$ pip install -r requirements.txt
$ dvc pull # optional, otherwise will pull datasets dynamically
$ pytest
$ pytest tests/benchmarks/cli/commands/test_add.py
$ pytest -h
...
--size={tiny,small,large}
Size of the dataset/datafile to use in
tests (default: small)
--remote={azure,gdrive,gs,hdfs,http,oss,s3,ssh,webdav}
Remote type to use in tests (default: local)
...
$ py.test-benchmark compare --histogram histograms/ --group-by name --sort name --csv results.csv
pip install dvc==2.5.4
pytest
pip install dvc==2.6.0
pytest
tests/benchmarks structure:
- cli: should be able to run these with any dvc (rpm, deb, pypi, snap, etc) (could be used in dvc-test repo too)
- commands: granular tests for individual commands. These should have a cached setup, so that we could use them during rapid development instead of our hand-written scripts. Every test could be run in a separate machine.
- stories: multistage start-to-end benchmarks, useful for testing workflows (e.g. in documentation, see test_sharing inspired by sharing-data-and-models use-case. Every full story could be run in a separate machine.
- api: for python api only.
- methods: granular tests for individual methods (e.g.
api.open/read
). Same reasoning as incli.commands
- stories: same as
cli.stories
but for our api. E.g. imagine using our api with pandas or smth like that.
- methods: granular tests for individual methods (e.g.