DVC and syrupy. .dvc tracking files deleted by "pytest --snapshot-update"
drjasonharrison opened this issue · 7 comments
Describe the bug
DVC is a tool that combines git and external storage to track large files that shouldn't be added to the git repository but that need to be tracked and stored. For a PNG file like __snapshots__/test_image/test_image.jpeg
running
```
dvc add __snapshots__/test_image/test_image.jpeg
```
will
- add a
.gitignore
entry for the file__snapshots__/test_image/test_image.jpeg
- create a file
__snapshots__/test_image/test_image.jpeg.dvc
to track the file - execute
git add __snapshots__/test_image/test_image.jpeg.dvc .gitignore
Then you use git commit; dvc push; git push
to copy everything to the storage and repositories.
So the snapshots created by syrupy are still in the working copy of the repository, but now there are .dvc files next to them. A new git clone will not include the syrupy snapshots until dvc pull
is executed.
When I ran pytest, syrupy reported
```
9 snapshots passed. 7 snapshots unused.
Re-run pytest with --snapshot-update to delete unused snapshots.
```
So I did pytest --snapshot-update
and the .dvc files under the __snapshots__
directory
were deleted. git restore
brought them back.
To reproduce
Note: This uses bash commands to create a .dvc file that will be deleted
by running pytest --snapshot-update
. This simplifies the reproduction steps.
-
mkdir dvc-and-syrupy
-
cd dvc-and-syrupy
-
pipenv --python 3.7
-
pipenv shell
-
git init
-
wget https://raw.githubusercontent.com/tophat/syrupy/main/tests/examples/test_custom_image_extension.py
-
python -m pip install pytest syrupy
-
pytest --snapshot-update
-
git status
On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) Pipfile __snapshots__/ dvc-and-syrupy.md test_custom_image_extension.py nothing added to commit but untracked files present (use "git add" to track)
-
echo "this is a test dvc file" > snapshots/test_custom_image_extension/test_jpeg_image.jpg.dvc
-
echo "/test_jpeg_image.jpg" >> snapshots/test_custom_image_extension/.gitignore
-
git add snapshots/ test_custom_image_extension.py
-
git status
On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: __snapshots__/test_custom_image_extension/.gitignore new file: __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc new file: test_custom_image_extension.py Untracked files: (use "git add <file>..." to include in what will be committed) Pipfile
-
git commit -m "dvc tracking of snapshots"
[master (root-commit) 90fb787] dvc tracking of snapshots 3 files changed, 44 insertions(+) create mode 100644 __snapshots__/test_custom_image_extension/.gitignore create mode 100644 __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc create mode 100644 test_custom_image_extension.py
-
pytest --snapshot-update
=============================================================== test session starts =============================================================== platform linux -- Python 3.7.13, pytest-7.2.1, pluggy-1.0.0 rootdir: /home/jason/mm/dvc-and-syrupy plugins: syrupy-3.0.6 collected 1 item test_custom_image_extension.py . [100%] ------------------------------------------------------------- snapshot report summary ------------------------------------------------------------- 1 snapshot passed. 1 unused snapshot deleted. Deleted unknown snapshot fossil (__snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc) ================================================================ 1 passed in 0.02s ================================================================
-
git status
On branch master Changes not staged for commit: (use "git add/rm <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) deleted: __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc Untracked files: (use "git add <file>..." to include in what will be committed) Pipfile no changes added to commit (use "git add" and/or "git commit -a")
Expected behavior
It would be nice to have a way to tell syrupy that
certain files, or file patterns, under the snapshot folder should be ignored.
Screenshots
N/A
Environment (please complete the following information):
- OS: Ubuntu 18.04 but it's not an OS issue
- Syrupy Version: 2.
- Python Version: 3.7.13
Additional context
Right now I have in my repos, .gitignore, .dockerignore, .dvcignore, .jshintignore,
.npmignore, .prettierignore, .eslintignore..... Perhaps it's a sign that a tool has
reached a certain maturity when it accepts guidance on how to ignore unexpected files in
it's sandbox.
Additional note, with the unused snapshots (the .dvc files) pytest exits with an error code. So while every test can pass, the presence of the .dvc files results in a error which will require adjustments to CI/CD pipelines. One such adjustment is to add the --snapshot-warn-unused
flag to the pytest command.
Hi Jason,
I saw your issue after submitting this #703 . I think the issues might be related since the dvc file has the following extension .jpg.dvc
. We ran into a similar issue with .png.zip
. I am thinking basically having an multiple.
in the extension causes syrupy to behave unexpectedly.
We use exactly the same setup as @drjasonharrison for managing our snapshot files with DVC (very convenient!). We hence run into the same issues: when running pytest
with syrupy
, it tells us X snapshots unused.
and returns a non-zero status code, even if no tests failed. Consequentially, the related CI jobs fails.
Would be great one could specify a "file ignore list" for syrupy.
Otherwise, it's a fantastic tool! Keep up the good work!
@hoffch add the --snapshot-warn-unused
flag to the pytest
command.
@tolgaeren I doubt that the problem is multiple dots in the filename. Syrupy is assuming that all files in the snapshots folder are from running pytest --update-snapshots
. When pyest
runs syrupy tracks which files in the snapshots folder are accessed and errors or warns (see above) about unused files.
Possible solutions include a "file ignore list" such as a .syrupyignore file, or a better understanding of other tools. The first is general but lacks insight that each snapshot file will also have a separate tracking file. The second would require code changes for each additional tool that might make files in the snapshots folder.
Alternatively, DVC could be used to track the root snapshots folder rather than the individual snapshot files or test folders.
So that's two workarounds.
I'd prefer not to have tool specific logic in syrupy if it can be helped.
For that reason I like the "ignore" file/config approach.
For some context, the reason syrupy considers all file extensions as possible snapshots is because it's possible to write custom extensions with arbitrary extensions. The single file extension makes this quite easy.
Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end. This would allow dvc files that no longer track snapshots to be flagged as removable.
Unused snapshot detection logic for reference: https://github.com/tophat/syrupy/blob/5eee3d845cd3d3c4a47ea981911e5d4f8ebe83e0/src/syrupy/report.py#L202
and the default "discover_snapshots" method of the extension class: https://github.com/tophat/syrupy/blob/5eee3d845cd3d3c4a47ea981911e5d4f8ebe83e0/src/syrupy/extensions/base.py#L109
The ignore config is the simplest approach since we could just pass an ignore list to walk_snapshot_dir
.
Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end.
It's an interesting idea, however if someone is playing around with a custom extension and they rename the file extension, that may cause issues. It also somewhat conflicts with #703.