fireEvent.sigkill fails on some CI systems
crutchcorn opened this issue · 23 comments
While this code runs on all the same versions of Node on Windows within CI, it seems to error out on platforms like Windows.
Log:
https://github.com/crutchcorn/cli-testing-library/runs/4339953929?check_suite_focus=true
It appears to be a permissions issue of being unable to taskkill
(or similar) a PID on Windows machines on GH Actions CI. This makes sense to me, as it may introduce vulnerabilities.
As a temporary workaround, I will be removing that test from CI and simply doing tests manually on Windows myself. Please let us know if there's a better way you know of
OK I'm stumped. I've tried for hours and hours now with no avail. It seems to work fine in CircleCI, but not GitHub actions.
I'm going to mark this as "help needed" and hope the ecosystem has some kind of solution
Note: This library works fine for Windows itself. I will be manually testing everything on my machine to make sure it works and continue to write Windows-specific tests
OK it's not just sigkill
and it's not just Windows
https://github.com/crutchcorn/cli-testing-library/actions/runs/1558975085
There is some timing issue somewhere, and I don't think it's our code, but I could be wrong.
I fixed the timing issue (Node 14 seems to have changed something with spawn
behavior that Node 12 doesn't have)
Sure enough, both Windows and Ubuntu tests pass now
https://github.com/crutchcorn/cli-testing-library/runs/4522732155?check_suite_focus=true
https://github.com/crutchcorn/cli-testing-library/runs/4522795843?check_suite_focus=true
We're running into this issue again in Plop as Windows devices are failing again.
I'm willing to pay a bounty of $50 if anyone is able to solve these issues well enough that Windows CI systems do not fail on GitHub actions on the Plop monorepo 5x in a row.
/bounty $50
💎 $50 bounty created by crutchcorn
🙋 If you start working on this, comment /attempt #3
to notify everyone
👉 To claim this bounty, submit a pull request that includes the text /claim #3
somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to crutchcorn/cli-testing-library!
Attempt | Started (GMT+0) | Solution |
---|---|---|
🟢 @smartbizlord | Dec 31, 2023, 5:42:05 AM | WIP |
🟢 @Waldeedle | Jan 1, 2024, 8:13:43 PM | #26 |
/attempt #3
Options
OK I'm stumped. I've tried for hours and hours now with no avail. It seems to work fine in CircleCI, but not GitHub actions.
I'm going to mark this as "help needed" and hope the ecosystem has some kind of solution
Note: This library works fine for Windows itself. I will be manually testing everything on my machine to make sure it works and continue to write Windows-specific tests
Hey @crutchcorn Is it possible for me to get the config that you used for circle ci?
@Drex72 it might be in the commit history. Otherwise I wouldn't know where it is :( Sorry I'm not more help
@Drex72 it might be in the commit history. Otherwise I wouldn't know where it is :( Sorry I'm not more help
Alright, no problem
that's cool
i'd check for it
i cant see the logs because it says it doesnt exist so i want to create a repro :)
Failing job in Plop
https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229 windows-latest
, Node.js 20.x
plop:test
cache miss, executing b00a87c8e47ca0af
> plop@4.0.1 test:instrument
> nyc instrument ./bin ./instrumented/bin && nyc instrument ./src ./instrumented/src && cp package.json ./instrumented
Browserslist: caniuse-lite is outdated. Please run:
npx browserslist@latest --update-db
Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
Browserslist: caniuse-lite is outdated. Please run:
npx browserslist@latest --update-db
Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
> plop@4.0.1 vitest
> vitest run
RUN v1.1.0 D:/a/plop/plop/packages/plop
✓ tests/actions.spec.js (6 tests | 4 skipped) 6435ms
✓ tests/wrapper.spec.js (5 tests) 15245ms
✓ tests/esm.spec.js (4 tests) 11460ms
✓ tests/action-failure.spec.js (1 test) 3675ms
✓ tests/typescript.spec.js (1 test) 2179ms
✓ tests/input-processing.spec.js (16 tests | 1 skipped) 35128ms
⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯
Vitest caught 1 unhandled error during the test run.
This might cause false positive tests. Resolve unhandled errors to make sure your tests are not affected.
⎯⎯⎯⎯ Unhandled Rejection ⎯⎯⎯⎯⎯
Error: Command failed: taskkill /pid 1544 /T /F
ERROR: The process with PID 5052 (child process of PID 1544) could not be terminated.
Reason: The operation attempted is not supported.
ERROR: The process with PID 1544 (child process of PID 5[72](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:74)8) could not be terminated.
Reason: The operation attempted is not supported.
❯ ChildProcess.exithandler node:child_process:422:12
❯ ChildProcess.emit node:events:517:28
❯ maybeClose node:internal/child_process:1098:16
❯ Process.ChildProcess._handle.onexit node:internal/child_process:303:5
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { code: 128, killed: false, signal: null, cmd: 'taskkill /pid 1544 /T /F' }
This error originated in "tests/input-processing.spec.js" test file. It doesn't mean the error was thrown inside the file itself, but while it was running.
The latest test that might've caused the error is "Should bypass input prompt with name". It might mean one of the following:
- The error was thrown, while Vitest was running this test.
- This was the last recorded test before the error was thrown, if error originated after test finished its execution.
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Test Files 6 passed (6)
Tests 28 passed | 5 todo (33)
Errors 1 error
Start at 22:39:53
Duration 36.38s (transform 179ms, setup 1.64s, collect 657ms, tests [74](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:76).12s, environment 2ms, prepare 1.[78](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:80)s)
npm ERR! Lifecycle script `vitest` failed with error:
npm ERR! Error: command failed
npm ERR! in workspace: plop@4.0.1
npm ERR! at location: D:\a\plop\plop\packages\plop
Error: command finished with error: command (D:\a\plop\plop\packages\plop) C:\Users\RUNNER~1\AppData\Local\Temp\xfs-42d[81](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:83)1ce\yarn.cmd run test exited (1)
Error: plop#test: command (D:\a\plop\plop\packages\plop) C:\Users\RUNNER~1\AppData\Local\Temp\xfs-42d811ce\yarn.cmd run test exited (1)
Tasks: 1 successful, 2 total
Cached: 0 cached, 2 total
Time: 54.193s
Failed: plop#test
ERROR run failed: command exited (1)
Error: Process completed with exit code 1.
@crutchcorn just want to be clear, you are not seeing this issue for yourself locally? I was able to reproduce it on my local windows 11 desktop
@Waldeedle INTERESTING. I, in fact, was not able to reproduce it locally even recently on the Plop repo.
@crutchcorn try cloning the repo to a new folder and just run the exact commands from your test workflow:
I think your hunch is right, I wonder if turbo is not handling child processes correctly because I can only reproduce it on the initial clone of the repo, any subsequent run just works.
EDIT: A process that could not be killed from the first run continues to run in the background throughout these subsequent runs:
Here's some copy-pasta for your convenience
git clone git@github.com:plopjs/plop.git
cd plop/
yarn install --frozen-lockfile
yarn test
@Waldeedle it's interesting it mentions Microsoft AV. What if you turn it off (temporarily) for running? [Only if you trust it, feel free to read the source]
I have mine disabled in favor of a different antivirus and maybe that could be part of the root.
@crutchcorn assuming you meant from my screenshot? Sorry should've clarified that this was from my task manager, its going to show all processes, I will switch over to process explorer so I can see the related processes from the test
/attempt #3
I see where it is erroring out, and believe I have a potential solution.
Before I create a PR, could I get context around, this particular if statement, why are you attempting to kill a process that has already exited/doesn't exist?
if (err.message.includes('could not be terminated') && err.message.includes('There is no running instance of the task.'))
Options
Note
The user @smartbizlord is already attempting to complete issue #3 and claim the bounty. We recommend checking in on @smartbizlord's progress, and potentially collaborating, before starting a new solution.
I see where it is erroring out, and believe I have a potential solution.
Before I create a PR, could I get context around, this particular if statement, why are you attempting to kill a process that has already exited/doesn't exist?if (err.message.includes('could not be terminated') && err.message.includes('There is no running instance of the task.'))
Truthfully I don't remember. I think this was an attempt to fix the problem at hand or work around windows specifically but idk
💡 @Waldeedle submitted a pull request that claims the bounty. You can visit your bounty board to reward.
@Waldeedle: Your claim has been rewarded! 👉 Complete your Algora onboarding to collect the bounty.
@Waldeedle thanks so much for contributing to this project and helping us out! Great work!
@crutchcorn anytime, thanks for the brain teaser! 👏