[Regression in 14.1.0 - Windows] `stdout` is sometimes empty
nicolo-ribaudo opened this issue · 148 comments
- Version: 14.1.0
- Platform: Windows 64-bit on VirtualBox and on Travis CI
- Subsystem: N/A
Bug description
After the Node.js 14.1.0 release, Babel's CI tests on Windows started failing. The failures are all related to @babel/cli
and @babel/node
: we run those CLI programs, capture their output (from stdout
) and compare it with the expected output.
Sometimes, the generated stdout
is empty: you can see an example here or here. In this CI log you can also see some error messages coming from V8 internals in @babel/node
tests, but I don't know if it's the same problem.
How often does it reproduce? Is there a required condition?
My guess is that every test has about 1% chance of failing. However, we have ~100 tests for those 2 packages so something is failing more often than not.
What steps will reproduce the bug?
I couldn't find a small and isolated reproduction example. I'll keep trying to create one, but here is what I have for now.
Also, building Babel on Windows is painful and I couldn't run the full test suite. However, I managed to reproduce the bug.
I'm running these commands using Powershell, Nodej.s 14.1.0 and Yarn 1.22
# Clone the babel repository
git clone https://github.com/babel/babel.git .
# Install deps
yarn
yarn lerna bootstrap # This takes a while
# Build Babel (this won't build everything, but enough packages to show the bug)
yarn gulp build-no-bundle
# Run the @babel/cli and @babel/node tests
yarn jest --runInBand babel-node babel-cli
If you don't see the last command failing, try running it 2 or three times. The --runInBand
option isn't necessary to reproduce the problem, but it disables Jest's workers so it removes one possible cause.
What is the expected behavior?
Tests should pass
What do you see instead?
An example of output failing locally is this:
PS C:\Users\Nicolò\Downloads\babel-master\babel-master> yarn jest --runInBand babel-node babel-cli
yarn run v1.22.4
$ C:\Users\Nicolò\Downloads\babel-master\babel-master\node_modules\.bin\jest --runInBand babel-node babel-cli
FAIL packages/babel-cli/test/index.js (33.099s)
● bin/babel › --only glob
"C:\Users\Nicolò\Downloads\babel-master\babel-master\packages\babel-cli\lib\babel" "--presets" "C:\Users\Nicolò\Downloads\babel-master\babel-master\packages\babel-preset-react" "--plugins" "C:\Users\Nicolò\Downloads\babel-master\babel-master\packages\babel-plugin-transform-arrow-functions,C:\Users\Nicolò\Downloads\babel-master\babel-master\packages\babel-plugin-transform-strict-mode,C:\Users\Nicolò\Downloads\babel-master\babel-master\packages\babel-plugin-transform-modules-commonjs" "src" "--out-dir" "lib" "--only" "**/*.foo.js" "--verbose": expect(received).toBe(expected) // Object.is equality
Expected: "src/a.foo.js -> lib/a.foo.js
src/baz/b.foo.js -> lib/baz/b.foo.js
Successfully compiled 2 files with Babel."
Received: ""
85 | expect(stdout).toContain(expectStdout);
86 | } else {
> 87 | expect(stdout).toBe(expectStdout);
| ^
88 | }
89 | } else if (stdout) {
90 | throw new Error("stdout:\n" + stdout);
at assertTest (packages/babel-cli/test/index.js:87:22)
at ChildProcess.<anonymous> (packages/babel-cli/test/index.js:152:9)
PASS packages/babel-node/test/index.js (22.924s)
Test Suites: 1 failed, 1 passed, 2 total
Tests: 1 failed, 89 passed, 90 total
Snapshots: 0 total
Time: 56.088s, estimated 62s
Ran all test suites matching /babel-node|babel-cli/i.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Additional information
Can you confirm this happening only on Windows? This is widely unexpected.
I can't reproduce on OSX. @nicolo-ribaudo any chance you could bisect? I don't have a Windows machine at the moment.
Yes, I can only reproduce this on windows (not the V8 errors), only the empty stdout
s. I'll try to bisect.
@ronag I’m not sure… Stack traces on Windows are basically garbage unless you take some extra steps (that I don’t know how to take), and given that the error message isn’t particularly helpful here either – there are over 2000 instances of UNREACHABLE();
in V8 – I’m not quite sure where to start here.
/cc @nodejs/v8 @nodejs/platform-windows
Update: this might not be a Node.js bug.
Today I made two local clones of the Babel repository: one ~6 hours ago and I can consistently reproduce the errors there, and one ~2 hours ago where I cannot reproduce the errors. This makes me think that maybe it's a bug in a dependency that didn't support Node.js 14.1.0 on windows, and it was recently updated to fix the bug.
Even if there was a bug in a dependency I don't think that it should trigger the V8 internal error, but I cannot reproduce that one anyway 🤷
I'll try to re-run the build on Travis CI a bunch of times, and see if I can reproduce it there.
This makes me think that maybe it's a bug in a dependency that didn't support Node.js 14.1.0 on windows, and it was recently updated to fix the bug.
Can you diff the dependencies?
Yeah I'm trying. I'm on a Windows VM and I haven't used Windows for a while, so it might take some time 😛
Also, another thing I noticed: in the folder where I can (consistently!) reproduce the problem, I can only reproduce it when using Node.js 14.1.0. If I use a build form master
(4b2d958) it doesn't fail.
If I use a build form master (4b2d958) it doesn't fail.
Yea, a bisect would be very useful.
I gave up for today, I will continue tomorrow.
Also, I noticed that Jest's own tests have the same problem: https://github.com/facebook/jest/pull/9934/checks?check_run_id=633574558 (cc @SimenB)
Not sure if it's related, I got win10 + nodejs@14 + babel
on CI with flaky ELIFECYCLE/3221225477
exit code in one of my repo: nodejs/help#2660
Update:
I've reduced CI exit test case to some of my source files and babel + 2 babel plugins
The code is at: https://github.com/dr-js/dr-dev/tree/8e111ec5194adc3e159db2d3bc3594f8587dc230
The latest CI fail: https://github.com/dr-js/dr-dev/actions/runs/92784081
Update 2:
With more debug log found the exit happen before the @babel/env
plugin gets load, after minify-replace
and module-resolver
loaded.
Related CI run: https://github.com/dr-js/dr-dev/actions/runs/92845322
Yeah, Jest is failing often on Node 14.1.0 on Windows with empty stdout in one test or the other almost every test run. The V8 thing is actually in the latest build on master: https://github.com/facebook/jest/runs/634830924
Ok, have no idea where to start bisecting. I can still consistently reproduce the failures with the Node.js 14.1.0 installed with the installer, but not if I build 9ffd8e7.
Ok, have no idea where to start bisecting. I can still consistently reproduce the failures with the Node.js 14.1.0 installed with the installer, but not if I build 9ffd8e7.
@nodejs/releasers can you check if there could be any differences between what is available via the installer and what was tagged?
I think there's no way to be sure it's the promoted one, but I can confirm that the latest release build was done with commit 9ffd8e7
Refs: https://ci-release.nodejs.org/job/iojs+release/5995/
The times of the file on the server and in the build logs also match.
@jasnell I know you have a beefy windows machine, could you check this out and run a bisect?
Yeah, Jest is failing often on Node 14.1.0 on Windows with empty stdout in one test or the other almost every test run. The V8 thing is actually in the latest build on master: https://github.com/facebook/jest/runs/634830924
@SimenB This is worrysome. Are you using any wasm inside Jest?
I'm building the release commit on my machine (Win10, VS2019)
@mcollina nope, only experimental stuff we ship is the VM ESM APIs, but that code is guarded behind a check that vm.SyntheticModule
is defined, which is flagged.
EDIT: That said, some dependency might? Seems unlikely, though - node 14.0 behaved fine, and both mac and linux on 14.1 behaves fine
Well, we'd need some more simple and reliable way to reproduce this.
I cloned Jest and tried to run e2e\__tests__\toThrowErrorMatchingInlineSnapshot.test.ts
multiple times with my local build and the release but it never errors.
That said, some dependency might? Seems unlikely, though - node 14.0 behaved fine, and both mac and linux on 14.1 behaves fine
The V8 stacktrace you posted shows a failure within some code in the wasm namespace in V8.
Hopefully @targos can shed some light on this.
Commits between 14.0.0 and 14.1.0: v14.0.0...v14.1.0
3 of them to deps/v8:
- bbed1e5 only changed a test, cannot affect releases.
- b50333e refactored the native code for
Intl.NumberFormat
- 56f50ae fixed a bug related to inline caches. The linked issue (https://bugs.chromium.org/p/chromium/issues/detail?id=1053939) is under embargo so I guess that's related to security in Chromium. I do not know if those code paths are hit by WASM modules.
I'm now trying to remove as much code as possible from my local failing clone of Babel to reduce the number of possible error causes.
This seems to be not babel specific: https://github.com/pinojs/pino/runs/642073186. Again, this is consistently flaky :(.
@targos did you run the tests in Linux WSL or using the normal bin? I have a hunch that Github CI is using WSL.
I used the normal bin in PowerShell
Would you mind to try out if it works in WSL?
@mcollina what exactly do you want me to try?
BTW I cloned pino, npm i
, npm t
with Node.js 14.1.0 on Windows and all tests pass 😕
Can you do the same inside WSL?
No problem inside WSL either.
I think this may be only reproducible if stdout is not a terminal. Is there a simple way to test this on Windows?
Note that which test fails on Jest's CI varies, and not all test runs even have a failure. So I think this might be super hard to nail down a consistent reproduction for 😟
@nicolo-ribaudo given that you are the only one with a repro on your development machine, could you try to isolate which commit is causing the problem (e.g. bisect)?
I can try again, but if I build the release commit with .\vcbuild
I can't reproduce the issue.
I can try again, but if I build the release commit with .\vcbuild I can't reproduce the issue.
Oh sorry, then don't bother :(. So this only happens with the binary we have built.
Maybe in 14.1.1, with a new binary, it will magically fix itself 🤷♂️
@nicolo-ribaudo are you running the tests in WSL or cmd or powershell? Neither @targos nor myself can repro them.
I'm using powershell.
I don't know if it's possible/legal, but I can try to share my virtualbox image with you.
I ran a test rebuild of the v14.1.0 release commit 9ffd8e7 on the Node.js infrastructure - this could rule out whether anything obscure went wrong with the published release build.
https://nodejs.org/download/test/v14.1.0-test202005049ffd8e7a74/
@nicolo-ribaudo do you know how I can run jest with custom Node runtime on Windows?
I used path/to/node.exe ./node_modules/jest/bin/jest babel-node babel-cli
.
@BethGriggs I'll check it later, thanks!
I don't know if it's relevant to this specific issue, but I did notice that between 13.11 and and 13.12, when any stream piping to process.stdout
or process.stderr
closes, process.stdout
/process.stderr
end up closing allowing no further output. This behavior is carried over into 14.x and is on all platforms...
To give it a quick test, nvm use 13.11 and try...
$ node
Welcome to Node.js v13.11.0
Type ".help" for more information.
> const w = new stream.PassThrough()
undefined
> stream.pipeline(w, process.stdout, () => {})
<truncated>
> w.end('hello')
<truncated>
Then nvm use 13.12 and do the same thing.
You'll see that on 13.11, process.stdout
is still usable while on anything beyond 13.12, it is not.
/cc @ronag
@jasnell I don't think that it is related, because the problem reported in this issue started appearing in 14.1.0
I tested the test release and I could still reproduce the bug.
@BethGriggs is there any flag I should use to produce a release-like build locally?
Also, I tried to think about what my env has in common with CI servers, to understand why I'm the only one that can reproduce the bug.
The only thing I noticed is that in this VM I have a very low available disk space (1.5GB), and it might similar to CI.
I can replicate the error by running babel tests with the shipped release. I cannot replicate it by building manually the release commit.
I can confirm there is something odd happening here.
For Windows builds our Release CI calls vcbuild.bat build-release x64 upload
. This is building on Windows 2012 R2 (Rackspace) with Visual Studio 2019.
11:46:03 c:\ws>vcbuild.bat build-release x64 upload
11:46:03 Looking for Python
11:46:03 Python found in C:\Python27\\python.exe
11:46:03 Looking for NASM
11:46:03 Looking for Visual Studio 2019
11:46:03 Looking for WiX installation for Visual Studio 2019...
11:46:03 calling: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\\Auxiliary\Build\vcvarsall.bat" amd64
11:46:04 **********************************************************************
11:46:04 ** Visual Studio 2019 Developer Command Prompt v16.5.4
11:46:04 ** Copyright (c) 2019 Microsoft Corporation
11:46:04 **********************************************************************
11:46:04 [vcvarsall.bat] Environment initialized for: 'x64'
11:46:04 Found MSVS version 16.0
11:46:04 configure --with-ltcg --release-urlbase=https://nodejs.org/download/test/ "--download=all" --with-intl=full-icu --dest-cpu=x64 --tag=test202005049ffd8e7a74
11:46:11 Node.js configure: Found Python 2.7.18...
11:46:11 Warning: Missing input files:
11:46:11 tools\msvs\genfiles\node_etw_provider.rc
11:46:11 tools\msvs\genfiles\node_etw_provider.h
11:46:11 INFO: configure completed successfully
@jasnell FYI the babel test error is not reproducible on both node.js 13.12 and 13.13: https://travis-ci.com/github/JLHwung/babel/builds/163828049
Unfortunately that's what I suspected :-(
@mcollina can you please try moving the commit built binary to a different folder and try running the tests again?
@mcollina can you please try moving the commit built binary to a different folder and try running the tests again?
Same
@BethGriggs I am using Visual Studio 2017. I'm trying to run vcbuild.bat build-release x64
but it cannot find Wix. I do not know how to install that :/, there are no guides linked in BUILDING.md.
I don't think build-release
creates a different node.exe
but I might be wrong. Wix is needed to create the .msi
installer but that's separate from creating node.exe
build-release
enables LTCG
I can confirm that the releases of 14.2.0 do not solve this issue.
try .\vcbuild.bat release (no need for Wix). That should enable LTCG
This build is not causing any errors as well.
@mcollina could you give me the steps you followed to reproduce the bug on your machine with the release build? I have the same version of visual studio than CI on my machine, so we could eliminate this difference at least if I'm able to reproduce.
# Clone the babel repository
git clone https://github.com/babel/babel.git .
# Install deps
yarn
yarn lerna bootstrap # This takes a while
# Build Babel (this won't build everything, but enough packages to show the bug)
yarn gulp build-no-bundle
# Run tests with custom node
path\to\node.exe node_modules\jest\bin\jest babel-node babel-cli
Using the release node.exe fails, while using the one I build it does not fail.
(I download the zip file with the binary)
One thing that might be worth testing is running with --run-in-band
. Without this flag Jest will spawn processes to parallelize test runs. The flag will disable this behavior and run everything in the main process. Might eliminate one source of the error? (if you only have a single core the tests will already run in band, so might not be applicable to VMs and such, depending on how accurate os.cpus()
are in those cases)
Process/worker abstraction, for reference: https://github.com/facebook/jest/tree/master/packages/jest-worker
I still can't reproduce using the official release (with or without --runInBand) :(
PS D:\Git\forks\babel> node .\node_modules\jest\bin\jest.js --runInBand babel-node babel-cli
PASS packages/babel-cli/test/index.js (49.684s)
PASS packages/babel-node/test/index.js (33.254s)
Test Suites: 2 passed, 2 total
Tests: 90 passed, 90 total
Snapshots: 0 total
Time: 83.059s, estimated 84s
Ran all test suites matching /babel-node|babel-cli/i.
The fact that we cannot do a build that cause a failure on a machine worries me a lot.
Do you think it would be possible to do a manaul git-bisect through the build server? I can download and run them on my machine to see which one cause the failure.
Step 1, with f92b398: https://ci-release.nodejs.org/job/iojs+release/6010/
I do not have permission to access that Jenkins instance.
14.0.0 works?
Yes.
I'll post links to the dowloads when they are ready
@mcollina I did an rc.0 build for v14.1.0 on f878917 (that was prior to pulling in additional requested V8 and streams PR's IIRC). It might be worth trying - https://nodejs.org/download/rc/v14.1.0-rc.0/
I'm also thinking in some way could this be a VS 2019 vs VS 2017 issue? Just based on the release machine and @targos using VS 2019 and not being able to reproduce the issue.
Also ping @nodejs/build in case something has changed with our Windows release infrastructure recently
@mcollina Yarn might not work with those binaries if they have a prerelease version. If it doesn't, you need to run the first commands with 14.1.0 or 14.0.0, and then only jest with the test build.
I'll also test those binaries so that we don't risk a false negative, since the failures are flaky.
Yarn might not work with those binaries if they have a prerelease version. If it doesn't, you need to run the first commands with 14.1.0 or 14.0.0, and then only jest with the test build.
That's what I have been doing. I got a nice crack on this.
@BethGriggs the test build is finished. Do I need to do something to publish it to the website?
Edit: trying again with a datestring: https://ci-release.nodejs.org/job/iojs+release/6013/
It says that it's only needed for nightly but I see you provided it yesterday too.
I'm also thinking in some way could this be a VS 2019 vs VS 2017 issue?
@BethGriggs that's my working theory, however it seems a bit of a long shot. Have we done any changes to the windows machines between 14.0.0 and 14.1.0?
I did an rc.0 build for v14.1.0 on f878917 (that was prior to pulling in additional requested V8 and streams PR's IIRC). It might be worth trying - https://nodejs.org/download/rc/v14.1.0-rc.0/
This still fails.
Does this fail with any of the nightlies (if it does then bisecting might be quicker if we can narrow down a passing/failing set of builds)? https://nodejs.org/download/nightly/
@targos recent changes to our caching means it can take up to ~30minutes for a build to be available on our download server, so you may hit that.
Does this fail with any of the nightlies (if it does then bisecting might be quicker if we can narrow down a passing/failing set of builds)? https://nodejs.org/download/nightly/
This is not possible because of some other semver-major changes in master that are actually breaking babel. It's emitting some deprecation warnings that are making babel test failing, so it's very hard to have a yes/no.
I'm testing the nightly versions:
- 23-Apr-2020 - works
- 24-Apr-2020 - works
- 25-Apr-2020 - works
- 26-Apr-2020 - broken
"works" = "passed 4 attempts"
@mcollina I disabled the stderr
assertion, so that the warning doesn't break the tests. If you want to try it, you can comment out the if
starting with if (opts.stderr) {
in packages/babel-cli/test/index.js
and packages/babel-cli/test/index.js
@BethGriggs It's been almost an hour and the build is still in staging. Did your test from yesterday get moved automatically?
I'm testing the nightly versions:
* [24-Apr-2020](https://nodejs.org/download/nightly/v15.0.0-nightly202004240619b600b2/) - works * [26-Apr-2020](https://nodejs.org/download/nightly/v15.0.0-nightly20200426f8d5474839/) - broken
"works" = "passed 4 attempts"
These are the commits between those two nightlies:
0619b60...f8d5474
I'm downloading 25-Apr-2020 to reduce that range.
EDIT: 25-Apr-2020 works.
@BethGriggs It's been almost an hour and the build is still in staging. Did your test from yesterday get moved automatically?
@targos I think your first build might not be promoted due to the promotion script expecting a specific format - nodejs/build#2309 (comment). My test build uploaded itself after around 30 minutes.
Got it!
@mcollina The first build of this interactive bisect session is here: https://direct.nodejs.org/download/test/v14.0.1-test20200505f92b398c76/
Tell me the result if you want to continue :)
My guesstimate are ab7d9db or ab7d9db
Those two are the same, I think you meant two different ones?
003fb53 and ab7d9db
Next build is on commit 4b6aa07: https://direct.nodejs.org/download/test/v14.0.1-test202005054b6aa077fe/
Next build is on commit 4b6aa07: https://direct.nodejs.org/download/test/v14.0.1-test202005054b6aa077fe/
Pass (4 attempts)
Next build is on commit 4b6aa07: https://direct.nodejs.org/download/test/v14.0.1-test202005054b6aa077fe/
Pass
Next build is on commit 23eda41: https://direct.nodejs.org/download/test/v14.0.1-test2020050523eda417b6/
Next build is on commit 23eda41: direct.nodejs.org/download/test/v14.0.1-test2020050523eda417b6
Fail
Next build is on commit 1af08e1: https://direct.nodejs.org/download/test/v14.0.1-test202005051af08e1c5e/
Still building, but I'm going to bed. Expect the link to work in ~45 minutes.
Let's continue tomorrow!
Hey, I'm ready to build more as needed :)
Next build is on commit 1af08e1: https://direct.nodejs.org/download/test/v14.0.1-test202005051af08e1c5e/
Fail
Next build is on commit 2bb4ac4: https://direct.nodejs.org/download/test/v14.0.1-test202005062bb4ac409b/
Next build is on commit 2bb4ac4: https://direct.nodejs.org/download/test/v14.0.1-test202005062bb4ac409b/
Fail
Next build is on commit 4143c74: https://direct.nodejs.org/download/test/v14.0.1-test202005064143c747fc/
(only one revision left to test after that)
Next build is on commit 4143c74: direct.nodejs.org/download/test/v14.0.1-test202005064143c747fc
Fail
I'm very confused (if I haven't missed any pass/fail, the bug is in the second or third commit of 737bd62...4143c74).
Last build is on commit c21f1f0: https://direct.nodejs.org/download/test/v14.0.1-test20200506c21f1f03c5/
Last build is on commit c21f1f0: https://direct.nodejs.org/download/test/v14.0.1-test20200506c21f1f03c5/
Pass.
I guess all of us who tried to guess the bad commit are bad guessers 😛
@mcollina Could you confirm that #33166 (comment) passes and #33166 (comment) fails?
That one's backported to v13, fwiw, so seems unlikely 😕 That said, Jest makes heavy use of compileFunction
(although not that new option yet) and vm
in general