syrupy-project/syrupy

DVC and syrupy. .dvc tracking files deleted by "pytest --snapshot-update"

drjasonharrison opened this issue · 7 comments

Describe the bug

DVC is a tool that combines git and external storage to track large files that shouldn't be added to the git repository but that need to be tracked and stored. For a PNG file like __snapshots__/test_image/test_image.jpeg running

```
dvc add __snapshots__/test_image/test_image.jpeg
```

will

  1. add a .gitignore entry for the file __snapshots__/test_image/test_image.jpeg
  2. create a file __snapshots__/test_image/test_image.jpeg.dvc to track the file
  3. execute git add __snapshots__/test_image/test_image.jpeg.dvc .gitignore

Then you use git commit; dvc push; git push to copy everything to the storage and repositories.

So the snapshots created by syrupy are still in the working copy of the repository, but now there are .dvc files next to them. A new git clone will not include the syrupy snapshots until dvc pull is executed.

When I ran pytest, syrupy reported

```
9 snapshots passed. 7 snapshots unused.

Re-run pytest with --snapshot-update to delete unused snapshots.
```

So I did pytest --snapshot-update and the .dvc files under the __snapshots__ directory
were deleted. git restore brought them back.

To reproduce

Note: This uses bash commands to create a .dvc file that will be deleted
by running pytest --snapshot-update. This simplifies the reproduction steps.

  1. mkdir dvc-and-syrupy

  2. cd dvc-and-syrupy

  3. pipenv --python 3.7

  4. pipenv shell

  5. git init

  6. wget https://raw.githubusercontent.com/tophat/syrupy/main/tests/examples/test_custom_image_extension.py

  7. python -m pip install pytest syrupy

  8. pytest --snapshot-update

  9. git status

    On branch master
    
    No commits yet
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
            __snapshots__/
            dvc-and-syrupy.md
            test_custom_image_extension.py
    
    nothing added to commit but untracked files present (use "git add" to track)
  10. echo "this is a test dvc file" > snapshots/test_custom_image_extension/test_jpeg_image.jpg.dvc

  11. echo "/test_jpeg_image.jpg" >> snapshots/test_custom_image_extension/.gitignore

  12. git add snapshots/ test_custom_image_extension.py

  13. git status

    On branch master
    
    No commits yet
    
    Changes to be committed:
      (use "git rm --cached <file>..." to unstage)
            new file:   __snapshots__/test_custom_image_extension/.gitignore
            new file:   __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
            new file:   test_custom_image_extension.py
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
  14. git commit -m "dvc tracking of snapshots"

    [master (root-commit) 90fb787] dvc tracking of snapshots
     3 files changed, 44 insertions(+)
     create mode 100644 __snapshots__/test_custom_image_extension/.gitignore
     create mode 100644 __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
     create mode 100644 test_custom_image_extension.py
  15. pytest --snapshot-update

    =============================================================== test session starts ===============================================================
    platform linux -- Python 3.7.13, pytest-7.2.1, pluggy-1.0.0
    rootdir: /home/jason/mm/dvc-and-syrupy
    plugins: syrupy-3.0.6
    collected 1 item
    
    test_custom_image_extension.py .                                                                                                            [100%]
    
    ------------------------------------------------------------- snapshot report summary -------------------------------------------------------------
    1 snapshot passed. 1 unused snapshot deleted.
    
    Deleted unknown snapshot fossil (__snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc)
    ================================================================ 1 passed in 0.02s ================================================================
  16. git status

    On branch master
    Changes not staged for commit:
      (use "git add/rm <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
            deleted:    __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
    
    no changes added to commit (use "git add" and/or "git commit -a")

Expected behavior

It would be nice to have a way to tell syrupy that
certain files, or file patterns, under the snapshot folder should be ignored.

Screenshots

N/A

Environment (please complete the following information):

  • OS: Ubuntu 18.04 but it's not an OS issue
  • Syrupy Version: 2.
  • Python Version: 3.7.13

Additional context

Right now I have in my repos, .gitignore, .dockerignore, .dvcignore, .jshintignore,
.npmignore, .prettierignore, .eslintignore..... Perhaps it's a sign that a tool has
reached a certain maturity when it accepts guidance on how to ignore unexpected files in
it's sandbox.

Additional note, with the unused snapshots (the .dvc files) pytest exits with an error code. So while every test can pass, the presence of the .dvc files results in a error which will require adjustments to CI/CD pipelines. One such adjustment is to add the --snapshot-warn-unused flag to the pytest command.

Hi Jason,

I saw your issue after submitting this #703 . I think the issues might be related since the dvc file has the following extension .jpg.dvc. We ran into a similar issue with .png.zip. I am thinking basically having an multiple. in the extension causes syrupy to behave unexpectedly.

We use exactly the same setup as @drjasonharrison for managing our snapshot files with DVC (very convenient!). We hence run into the same issues: when running pytest with syrupy, it tells us X snapshots unused. and returns a non-zero status code, even if no tests failed. Consequentially, the related CI jobs fails.

Would be great one could specify a "file ignore list" for syrupy.

Otherwise, it's a fantastic tool! Keep up the good work!

@hoffch add the --snapshot-warn-unused flag to the pytest command.

@tolgaeren I doubt that the problem is multiple dots in the filename. Syrupy is assuming that all files in the snapshots folder are from running pytest --update-snapshots. When pyest runs syrupy tracks which files in the snapshots folder are accessed and errors or warns (see above) about unused files.

Possible solutions include a "file ignore list" such as a .syrupyignore file, or a better understanding of other tools. The first is general but lacks insight that each snapshot file will also have a separate tracking file. The second would require code changes for each additional tool that might make files in the snapshots folder.

Alternatively, DVC could be used to track the root snapshots folder rather than the individual snapshot files or test folders.

So that's two workarounds.

I'd prefer not to have tool specific logic in syrupy if it can be helped.

For that reason I like the "ignore" file/config approach.

For some context, the reason syrupy considers all file extensions as possible snapshots is because it's possible to write custom extensions with arbitrary extensions. The single file extension makes this quite easy.

Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end. This would allow dvc files that no longer track snapshots to be flagged as removable.

Unused snapshot detection logic for reference: https://github.com/tophat/syrupy/blob/5eee3d845cd3d3c4a47ea981911e5d4f8ebe83e0/src/syrupy/report.py#L202

and the default "discover_snapshots" method of the extension class: https://github.com/tophat/syrupy/blob/5eee3d845cd3d3c4a47ea981911e5d4f8ebe83e0/src/syrupy/extensions/base.py#L109

The ignore config is the simplest approach since we could just pass an ignore list to walk_snapshot_dir.

Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end.

It's an interesting idea, however if someone is playing around with a custom extension and they rename the file extension, that may cause issues. It also somewhat conflicts with #703.