bcoe/c8

Running tests with coverage causes CPU to hit 100% after tests finish

achingbrain opened this issue · 20 comments

  • Version: v18.14.0
  • Platform: windows-latest, ubuntu-latest and macos-latest GitHub CI images

When running tests with coverage in CI, after the test run completes successfully, CPU spikes to 100% and the process hangs.

It happened in this build: https://github.com/ipfs/aegir/actions/runs/4225847659/jobs/7338650232 and sometimes happens locally too, though it's a lot more frequent in CI.

This started happening with node 18, it didn't seem to happen with node 16.

The command being run is similar to:

c8 --reporter json --report-dir .coverage --temp-directory /tmp/some-random-string --clean mocha test/node.{js,cjs,mjs} test/**/*.spec.{js,cjs,mjs} dist/test/node.{js,cjs,mjs} dist/test/**/*.spec.{js,cjs,mjs} --ui bdd --require source-map-support/register --timeout=60000 --bail

Same environment, same symptoms here: https://github.com/mmomtchev/sqlite-wasm-http/actions/runs/4377210809/attempts/1
Haven't been able to reproduce locally, only in GA

Node v19 does not solve it

How large does the --temp-directory / coverage/tmp grow? Could you check this with du -sh or similar tool? And how many files are there?
This can be checked in local environment as well.

44 MB in 90 files, all in coverage/tmp

That doesn't sound too bad. I've seen cases where CIs get slow when coverage/tmp reaches +450 MB. The process may seem stuck but it's actually just processing a lot of JSON files and takes a while.

But getting a CI stuck with 44MB files sounds like a different issue.

The CI build I linked to was killed after having stalled for six hours. Prior to the appearance of this bug processing the coverage report in CI took seconds so I definitely think it's a different issue!

I am running into the same issue. Able to run the tests and coverage successfully locally inside the applications' docker container (based on node:18.14.2-bullseye-slim). Using the same container in the Gitlab CICD environment I do get test results on stdout but hangs when generating coverage. Using the following Gitlab config (already commented out most of the post-processing stuff for debugging):

...
unit_test:
  stage: test
  image: $TEST_IMAGE_NAME # Build in the previous step based on the Dockerfile in the repo.
  variables:
    # We do not need GitLab to clone the source code.
    GIT_STRATEGY: none
    NODE_ENV: dev
  before_script:
    - cd /code
    - npm ci
  script:
    - npx c8 mocha
  # after_script:
  #   - mv test-results.xml ${CI_PROJECT_DIR}
  #   - mv coverage ${CI_PROJECT_DIR}
  # artifacts:
  #   when: always
  #   paths:
  #     - test-results.xml
  #     - coverage/cobertura-coverage.xml
  #   expire_in: 10 years
  #   reports:
  #     junit: test-results.xml
  #     coverage_report:
  #       coverage_format: cobertura
  #       path: coverage/cobertura-coverage.xml
  <<: *rules-default
...

My coverage/tmp only gets to 2MB (really small app)

Try adding NODE_DEBUG: 'c8' environment variable to see if c8 logs any errors.

@AriPerkkio It seems that Windows is not affected

PS. Finally, Windows is affected too, but the frequency is much lower

@achingbrain Did you find any solution?
This error reproduced in the local environment for me with node 16.14 and c8 7.11.3 but it's non-deterministic!

No, currently running tests without coverage due to this issue.

Somebody has to fix it. I can't reproduce it outside of Github Actions. Try attaching the inspector when it happens.

Somebody has to fix it. I can't reproduce it outside of Github Actions. Try attaching the inspector when it happens.

What do you mean by attaching in inspector, how can I help you?

I can reproduce this error in the local environment with node 16.14, I tried to find debug logs to report to you but the attempt was not successful! also, my problem was solved with node:18.14.0 in my GitLab pipeline.

I was able to reproduce it with node:18.14.0, it is not related
https://github.com/mmomtchev/sqlite-wasm-http/actions/runs/4991572755

The workaround that helped me is to enable the exit mode in Mocha (--exit on command line or "exit": true in the config).

Seems that there is some mutual issue in handling the exit between c8 and Mocha running as a child process. I suspected that there is some loop or wait process on the Mocha side waiting for something to finish after waiting for the tests.

I was unable to attach the debugger to the Mocha child process, Node seems to be in the phase when it does not accept debugger connections anymore.

Then I tried wtfnode in attempt to dump the remaining handlers. The only way of using wtfnode that worked for me is to insert wtf.dump() in the the Mocha's exitMochaLater handler. It gave basically empty result for me:

[WTF Node?] open handles:
- File descriptors: (note: stdio always exists)
  - fd 2 (tty) (stdio)
  - fd 1 (tty) (stdio)
  

If I'm not missing anything, then I guess what happens is that Mocha's process gets stalled waiting until the stdout pipe is read just before exiting, while the c8 process perhaps is not reading enough from this pipe to let Mocha's Node process finish. The workaround helps by making Mocha flush and exit the process explicitly after completing the tests. See the flag description for the dangers of the exit mode: https://mochajs.org/#-exit

I'm using macOS, btw, and the issue is occurring ~10% of time for me locally.

I also have no issues when running Mocha directly with the same arguments.

Still having the issue with "exit": true

bcoe commented

I wonder if this could be related to nodejs/node#49344 we should keep an eye and see if the problem continues once patched.

abcfy2 commented

I don't know if my issue is related to this issue.

I use ava with c8. When I use node 18, it's working, but when I upgrade to node 20, c8 will hang forever after ava test finished.

When I use node 18:

$ npm test
...
  ✔ application › should schedule a job (2s)
  ─

  7 tests passed
--------------------------|---------|----------|---------|---------|-------------------------------
File                      | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s             
--------------------------|---------|----------|---------|---------|-------------------------------
All files                 |      83 |    86.36 |   70.83 |      83 |                               
 _core                    |   87.73 |    91.66 |      80 |   87.73 |                               
  application.ts          |   87.73 |    91.66 |      80 |   87.73 | 86-94,167-169,177-178,184-195 
 _core/controller         |   76.92 |      100 |      75 |   76.92 |                               
  route.ts                |   76.92 |      100 |      75 |   76.92 | 35-49                         
 _core/controller/actions |   68.75 |       60 |      75 |   68.75 |                               
  health.ts               |     100 |      100 |     100 |     100 |                               
  siweAction.ts           |   63.41 |       50 |   66.66 |   63.41 | 15-18,28-29,33-41             
 _core/services           |    80.3 |       70 |      25 |    80.3 |                               
  dbService.ts            |      75 |      100 |       0 |      75 | 20-22,25-27,30-31             
  securityService.ts      |   85.29 |       70 |     100 |   85.29 | 11-12,16-17,29                
 _core/utils              |     100 |      100 |     100 |     100 |                               
  common.ts               |     100 |      100 |     100 |     100 |                               
--------------------------|---------|----------|---------|---------|-------------------------------

But when I use node 20, the same code with test will hang:

$ npm test

> sln-a@0.1.0 test
> c8 ava

...
 ✔ application › should schedule a job (2s)

# <==== hang forever and I have to kill node process to stop it
# no tests result output and ctrl-c is not working

Node.js 21.6.0 which includes the fix for nodejs/node#47748 resolved the issue for me.