microsoft/rushstack

[rush](package-deps-hash): uncaught errors happening when using Rush commands spawning git commands to calculate the repo state

antoine-coulon opened this issue · 4 comments

Summary

Hello,

While trying to bootstrap a Rush toolchain with Docker, I encountered an issue when running commands associated with repo state (rush build or any other bulk command) and more precisely interacting with git under the hood (I might report other issues soon related to that, but one problem at a time), I'll put the full error below.

When interacting with git in a Docker environment or any other environment, git might be there or might not, or maybe wrongly configured, whatever. I think Rush should work anyway and just skip repo state calculation when it can't rely on git in restrained environments and not crash.

Repro steps

  1. Clone the following repository https://github.com/antoine-coulon/rushelp. FYI this is a repository I created to simplify my way of reproducing Rush.js issues and working on them. The Dockerfile might be wrongly configuring git for Rush to work properly, but in that case its helpful (maybe we could fix it afterwards).

  2. Build the image and run the container. Then cd /build and from there you can run rush my-bulk-command or rush build and you'll get the error described just below.

Expected result:

I expect all the Rush commands relying on repo state to be working even if the repo state could not be computed for some reason.
Also, I would expect the warning from _getRepoDepsAsync to be emitted such that the one I got by investigating (I will open the PR):

Error calculating the state of the repo. (inner error: Error: git --no-optional-locks exited with code 128: fatal: Unable to hash bin ). Continuing without diffing files.

Actual result:

The actual result is an uncaught error that makes the process exit hence does not allow Rush to complete its command:

node:events:495
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at afterWriteDispatched (node:internal/stream_base_commons:160:15)
    at writeGeneric (node:internal/stream_base_commons:151:3)
    at Socket._writeGeneric (node:net:962:11)
    at Socket._write (node:net:974:8)
    at writeOrBuffer (node:internal/streams/writable:392:12)
    at _write (node:internal/streams/writable:333:10)
    at Writable.write (node:internal/streams/writable:337:10)
    at Readable.ondata (node:internal/streams/readable:809:22)
    at Readable.emit (node:events:517:28)
    at addChunk (node:internal/streams/readable:368:12)
Emitted 'error' event on Socket instance at:
    at Socket.onerror (node:internal/streams/readable:828:14)
    at Socket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}

Details

What do you think is the cause of this problem?

The cause of this problem is basically no error handlers when spawning git commands there on various streams on the spawned child process and given its async nature it just bubbles up in an uncaught way. More precisely, it seems to be related to process.stdin in my example but it could be due to any other streams.

Note also that it might leak memory it these specific cases because we don't close streams in case of errors.

How do you think we should fix this?

I think we should fix it by attaching error handlers when spawning git commands hence allowing rush-lib to process the remaining work and fallbacking to no cache. I'll open a PR with a suggestion, feel free to discuss it there.

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
@microsoft/rush globally installed version? 5.112.0
rushVersion from rush.json? 5.112.2
useWorkspaces from rush.json? Yes
Operating system? Mac / Linux
Would you consider contributing a PR? Yes, I'm already working on the PR
Node.js version (node -v)? 18.17.0

Hey again, by looking a bit more into similar issues I found #215 and #3517 to be linked.

I exactly get the same error when looking into stderr: Error calculating the state of the repo. (inner error: Error: git --no-optional-locks exited with code 128: fatal: Unable to hash bin, however because the process does not have "error" handler this just does not have a chance to be printed in the terminal and I think this kind of error should be caught and still emit the warning as it is shown in #215 and #3517, instead of crashing the process as I'm witnessing it.

The issue comes from using git hash-object on bin, for some reason /bin is tracked by git in the container but also it is a directory so git hash-object can't hash it if I'm not mistaken.

So I'll manage to get the error to be caught for this warning to be emitted, but also trying to look if we're not yielding incorrect values to be hashed (file directories instead of file paths).

cc @dmichon-msft, this might interest you, I think you worked on package-deps-hash :)

Okay so I think I know why it tries to hash bin and also why it fails, it's because bin is a symlinked directory on many Linux distribs so it's exactly the same problem as #3517 regarding the root issue.

I might keep discussing that issue in #3517, I'll just keep that issue open regarding the uncaught exception following the symlink problem.

If wrapping this Git operation in an error handler would unblock your scenario, would you be interested in opening a PR to do that?

Looks like you already did. Thanks!