vercel/turborepo

Pruning results in different hashes for files with CRLF line endings

Opened this issue · 1 comments

Verify canary release

  • I verified that the issue exists in the latest Turborepo canary release.

Link to code that reproduces this issue

https://github.com/cjquines/turborepo-gitattributes/

Which canary version will you have in your reproduction?

turbo 2.3.4-canary.2

Enviroment information

CLI:
   Version: 2.3.4-canary.2
   Path to executable: /home/runner/work/turborepo-gitattributes/turborepo-gitattributes/node_modules/.pnpm/turbo-linux-64@2.3.4-canary.2/node_modules/turbo-linux-64/bin/turbo
   Daemon status: Not running
   Package manager: pnpm

Platform:
   Architecture: x86_64
   Operating system: linux
   WSL: false
   Available memory (MB): 14810
   Available CPU cores: 4

Environment:
   CI: Some(
    "GitHub Actions",
)
   Terminal (TERM): unknown
   Terminal program (TERM_PROGRAM): unknown
   Terminal program version (TERM_PROGRAM_VERSION): unknown
   Shell (SHELL): unknown
   stdin: true

Expected behavior

File hashes for README-dos-dos.md and README-unix-dos.md should be the same before and after running turbo prune.

Actual behavior

File hashes are different.

To Reproduce

Checkout the repo. Run ./test.sh. Observe that the hashes of some files are different.

Additional context

Might have to do with .gitattributes, but not sure. CRLF is always weird.

I see the issue, we're calling git_odb_hashfile which has the disclaimer of:

Similar functionality to git.git's git hash-object without the -w flag, however, with the --no-filters flag

--no-filters
Hash the contents as is, ignoring any input filter that would have been chosen by the attributes mechanism, including the end-of-line conversion. If the file is read from standard input then this is always implied, unless the --path option
is given.

The hash outside out/ have hashes that respect .gitattribute because the hashes have been written to the object database since they're part of a commit. I will look and see if we can switch to use git_repository_hashfile as that can respect .gitattributes. (Update: this doesn't exist as a binding in the library we use so there might be a delay as I implement it)

In the meantime, there are two workarounds you could try if this is blocking for you:

  • Add the pruned output to a commit e.g. git add out && git commit -m 'pruned', this will get all of the hashes written to the object database with .gitattributes respected
  • Manually add the file hashes to the database: git hash-object -w out/apps/app-a/README-dos-dos.md