child_process.spawn() or child_process.exec()
Zirafnik opened this issue · 1 comments
Currently this library only supports the creation of node child processes through the use of child_process.fork()
.
I ran into a situation where I need to do extensive file processing on my web server. Initially I was looking for solutions in the Node.js space, but found that the same could be accomplished faster and more efficiently with external binaries through the command line (shell).
So, now instead of spawning a new node process (.fork()
), I need to execute shell commands, which run the desired binaries. This can be done with .spawn()
or .exec()
.
Workerpool
currently does not support these child_process methods, so I cannot create a pool of exec
workers, waiting for their shell commands.
The workaround is creating a fork
child, which then spawns exec
children of its own inside the provided function. I, however, do not fully understand the implications of this (what if there are errors, or if fork
child dies unexpectedly, ...) and it feels very hacky. Additionally, it forces the server to spawn more child processes than necessary meaning more threads become occupied.
Ex.:
1xfork
child -> spawns 3x exec
children => 4 new child processes created === 4 threads
...
3xfork
child -> spawns 9x exec
children => 12 new child processes created === 12 threads
Instead we could have just spawned a pool of necessary exec
children directly, avoiding the spawning of fork
processes. For example: spawn a pool of 3 workers for each exec
command => 9 workers, thus avoiding the 3 unnecessary fork
workers, which are just used to kick off the exec
s.
Furthermore, the exec
children cannot be re-used. Each time a fork
worker gets a task, it would run the provided function, which would first create the 3 exec
children (expensive) and then kill them. So for each task you would have to re-create the 3 exec
child processes.
I have not looked at the codebase to understand whether this would be hard to implement, but I imagine most of the code would stay the same (error handling, queue consumption, etc...), only the spawn process and input type would be different, along with some options.
P.S.: The same logic applies if you are using worker_threads instead of child_processes to kick off the exec
child_processes. As far as I am aware, the only difference is the shared memory of worker_threads, so if one crashes, so does the main thread (undesirable).
Related: #261