Crash when using background worker with node.js 18.14.1 and later
matthias-heller opened this issue ยท 12 comments
I was using an own build binary with the pkg-fetch patches for node 18.12.1 (everything was fine)
No as a new patch is available for node 18.14.1 I again build my own binary and this causes some crashes.
I have an application which starts a background worker. With the patches for node 18.12.1 I can run the application fine. I I used the prebuild binary for node 18.14.1 I got a crash while starting the background worker.
node:internal/assert:14 throw new ERR_INTERNAL_ASSERTION(message); ^ Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals. Please open an issue with this stack trace at https://github.com/nodejs/node/issues at new NodeError (node:internal/errors:399:5) at assert (node:internal/assert:14:11) at prepareExecution (node:internal/process/pre_execution:89:5) at prepareMainThreadExecution (node:internal/process/pre_execution:41:3) at node:internal/bootstrap/pkg:7:1 { code: 'ERR_INTERNAL_ASSERTION' } Node.js v18.14.1
Output of NODE_DEBUG=* set
The same issue happens also with node 18.14.2 patches in node.js 18.14.2
Thanks for the report @matthias-heller. What OS and arch are you using?
Are you able to provide a minimal reproducer repo that demonstrates the use of the background worker and the crash?
I'm using Windows OS. Windows 10 22H2 to be precise.
About a reproducer repo I have to check I hope I find some time. At the moment it is included in our application and I have to move it out. I can't promise right now, but I'll try
it is admittedly rather hard to create a reproducer since you don't have prebuilt binaries from pkg fetch to use right now, so a setup would have to be pretty custom.
Perhaps just paste in the few lines of generalized code on how you are starting your worker and then I can see if I can reproduce with that?
@baparham: I think I found the issue and maybe even a solution.
The issue occured with node 18.14.1 as there where mayor changes inside node.js. I think the change causing the issue is in file <node_src>/lib/internal/process/pre_execution.js and <node_src>/lib/internal/main/worker_thread.js. Inside pre_execution.js there is a new function called prepareWorkerThreadExecution(). This function is called from worker_thread.js.
Before node.js 18.14.1 the pre_execution.js was just handling the logic for main thread execution it seems. prepareMainThreadExecution and prepareWorkerThreadExecution are both calling the function prepareExecution inside the pre_execution.js. <node_src>/lib/internal/bootstrap/pkg.js (which is part of pkg-fetch) is always calling prepareMainThreadExecution.
Inside prepareExecution() there now are some asserts now which are causing the actual issue e.g.
if (isMainThread) { assert(internalBinding('worker').isMainThread);
In order to load the snapshots of pkg it seems pkg-fetch is patching the file /src/node.cc file inside function MaybeLocal StartExecution(Environment* env, StartExecutionCallback cb). pkg-fetch is adding a call to StartExecution(env, "internal/bootstrap/pkg");
I think (but I'm neither a node.js nor a pkg-fetch expert). Calling this function is just at the wrong place. At the moment it is called after
if (!env->snapshot_deserialize_main().IsEmpty()) {
return env->RunSnapshotDeserializeMain();
}
Which is fine in case no worker threads are being used. In the past this didn't assert in worker threads as the /lib/internal/main/worker_thread.js didn't use the function prepareExecution. So node.js was executing the remaining parts of node.cc file e.g.
if (env->worker_context() != nullptr) {
return StartExecution(env, "internal/main/worker_thread");
}
This code was actually starting the execution of the actual worker thread. Now prepareMainThreadExecution and prepareWorkerThreadExecution use the common function prepareExecution.
In case a worker thread is started due to the applied pkg-fetch code it runs into StartExecution(env, "internal/bootstrap/pkg") which then asserts because prepareMainThreadExecution is getting called but I'm just running inside a worker thread.
So I think instead of calling StartExecution(env, "internal/bootstrap/pkg"); after the
if (!env->snapshot_deserialize_main().IsEmpty())condition it just needs to be moved after the
if (env->worker_context() != nullptr)So that the logic inside node.cc looks like
... // TODO(joyeecheung): move these conditions into JS land and let the // deserialize main function take precedence. For workers, we need to // move the pre-execution part into a different file that can be // reused when dealing with user-defined main functions. if (!env->snapshot_deserialize_main().IsEmpty()) { return env->RunSnapshotDeserializeMain(); }if (env->worker_context() != nullptr) {
return StartExecution(env, "internal/main/worker_thread");
}StartExecution(env, "internal/bootstrap/pkg");
...
But maybe I'm wrong as said I'm not an expert inside node.js. Maybe somebody with deeper understanding can check if my assumption is correct or not.
You could try to make a custom patch with that change to see if it works, I'm not a nodejs expert neighter, maybe @baparham knows
I'll try to further investigate if I find some time. And if we have an official release with node 18.15.0 I can also provide a small sample repo
@baparham @robertsLando
My initial assumption about the change inside node.js is correct. So I think we need a change inside the file lib/internal/bootstrap/pkg.js file provided by pkg-fetch. The assert is really shown from there. At the moment I have not yet an idea how to solve it. I tried to change the file from
use strict'; const { prepareMainThreadExecution } = require('internal/process/pre_execution'); prepareMainThreadExecution(true); ...
to
'use strict'; const { prepareWorkerThreadExecution, prepareMainThreadExecution } = require('internal/process/pre_execution'); if (internalBinding('worker').isMainThread) { prepareMainThreadExecution(true); } else { prepareWorkerThreadExecution(); } ...
which removes the assertion and als helps a little bit to improve but still the worker threads are not running correctly. At the moment I don't have a clue how to further investigate as my node.js implementation knowledge is really limited.
Maybe somebody else can help out
I can at least already for node 18.14.1 confirm, that my last fix is working nicely. I'll try these days also with node 18.15.0 and will keep you updated here. If it works it would be great if somebody can integrate this really small fix into the pkg-fetch source code.
Yeah, let me know how it works out with 18.15.0.
How are you making the edit to test your fix? Applying patches from a modified node repo is not as complicated as it sounds if you are following the instructions here: https://github.com/vercel/pkg-fetch#contributing-updates-to-patches and you'd get to try you hand at submitting a PR for your hard work :-)
I tried it out with node 18.15.0 as well and the fix solves the issue. Therefore I created PR #288
@baparham @robertsLando: Here is now the pull request which solves the issue also in node.js 19.8.1. My local builds and tests are looking good. Hopefully everything works out here as well.