dvc-rawfile

We have compressed it because it occupies about 60MB, but the test is against the unzipped file (214089_JAI.raw)

The RAW file inside the tar.gz has a md5sum

fd0de1350b92b00d60afd53b015f6aea 214089_JAI.raw

But DVC calculates it as

  • md5: 0b4d86bc06ee3260e8172b2196805382 size: 63232000 path: 214089_JAI.raw

This happens because it considers it a text file and performs a dos2unix replacement. So

https://github.com/iterative/dvc/blob/1.11/dvc/utils/__init__.py#L39 -> https://github.com/iterative/dvc/blob/1.11/dvc/istextfile.py#L34

It still happens in version 2.4.3 https://github.com/iterative/dvc/blob/2.4.3/dvc/utils/__init__.py#L37 -> https://github.com/iterative/dvc/blob/2.4.3/dvc/istextfile.py#L22

When uploading it through the gocloud.dev library, it fails because of the MD5 check, since the one calculated by DVC and the real one of the file is not the same https://github.com/google/go-cloud/blob/v0.23.0/blob/blob.go#L328