remy/nodemon

Ability not to signal the whole subtree of a process

Jomshir98 opened this issue · 14 comments

  • Versions: node@v18.18.2, multiple platforms (Linux, Windows)
  • nodemon -v: 3.1.0
  • Operating system/terminal environment (powershell, gitshell, etc): multiple platforms (Linux, Windows)
  • Using Docker? What image: No
  • Command you ran: nodemon --enable-source-maps -r dotenv/config dist/index.js

Expected behaviour

Nodemon allows signaling the main process that it should stop. This process then does necessary cleanup like saving pending data into development database, stops the database or other services and then cleanly exits.
This in turn is noticed by Nodemon and the process is restarted.

Actual behaviour

Nodemon sends the signal to the whole process subtree of the command. As the application starts its own database, the database receives the termination command as well and stops before the main process has chance to finish cleaning up, getting it stuck until timeout happens.

Steps to reproduce

Start any process from inside the watched application that is needed during cleanup phase or that needs special cleanup - not being killed by a signal.
Example of this is setup using mongodb-memory-server package.

Reasoning

This behavior is problematic if the main watched process starts anything external, because no application expects its children to be killed by a different process that is higher up the chain.
If a process starts a subprocess, it should also be responsible for cleaning up the subprocesses and be allowed to do so in the correct order.

I understand that in some cases it might be beneficial to kill the whole subtree - such as if there is a chance of it getting stuck, but then this prevents correct cleanup for applications that do need to cleanup properly, making nodemon unusable for such projects.
That is why I am proposing to make this an option, for example: --only-signal-main

remy commented

Really the immediate question that jumps to mind is why is your main process responsible for spawning the database.

That aside, you can, in theory, achieve this yourself using the sigusr2 signal capture: https://github.com/remy/nodemon?tab=readme-ov-file#controlling-shutdown-of-your-script

Whether or not using SIGUSR2 works depends on the exact implementation of how specific spawned subprocess handles the signal (in fact not just the spawned subprocess, but any subprocess of that one as well), which is rarely documented, especially not for any transitive subprocesses (which might not be documented at all).

Also if the process simply does not handle the SIGUSR2 signal, then the default behavior is for it to terminate - which is exactly what I want to avoid.

Really the immediate question that jumps to mind is why is your main process responsible for spawning the database.

In my concrete case it is because the server normally connects to a real production database, but during development (which is also when I want to use nodemon) it spawns its own, correctly configured DB, so the developers don't need to set it up themselves - instead they just run the dev script that does it for them.

The question then might also be why not to start it outside nodemon... Well, to this I have two answers:

  • There would be need to communicate things like dynamically chosen port or auto-generated credentials to the process and this would need some bigger framework to do that nicely; wheres if the server is the one that manages db run, it gets these "for free"
  • Sometimes it is useful to run the database only in memory, or have multiple instances with independent dbs (such as during unit testing) - doing things this way just allows reusing the code everywhere instead of having a special solution for each path

Also note, that database is only one such example. Another (for me) common one is working with puppeteer - a framework for running browsers with automation. Browsers are notorious for spawning many subprocesses and there is no documentation on how they behave when some of them receive a specific signal, in arbitrary order.

remy commented

You have control of the top level process, so you can hook the signal. Have you tried this, to then gracefully handle your mongo-memory-thing then exit your own process?

I did - that is why I made this issue - it is not possible. My top process does hook the signal and does correctly handle it.
Problem is, that I don't have control about any child processes and the way nodemon does the signaling it signals all the children directly - so in a way that I literally can't hook without modifying the children.
Even making a wrapper around children is not a possible solution - because nodemon gets whole subtree and signals all processes in it, it will simply bypass any wrapper.

Ref:

nodemon/lib/monitor/run.js

Lines 403 to 429 in f8e3b8c

// we use psTree to kill the full subtree of nodemon, because when
// spawning processes like `coffee` under the `--debug` flag, it'll spawn
// it's own child, and that can't be killed by nodemon, so psTree gives us
// an array of PIDs that have spawned under nodemon, and we send each the
// configured signal (default: SIGUSR2) signal, which fixes #335
// note that psTree also works if `ps` is missing by looking in /proc
let sig = signal.replace('SIG', '');
psTree(child.pid, function (err, pids) {
// if ps isn't native to the OS, then we need to send the numeric value
// for the signal during the kill, `signals` is a lookup table for that.
if (!psTree.hasPS) {
sig = signals[signal];
}
// the sub processes need to be killed from smallest to largest
debug('sending kill signal to ' + pids.join(', '));
child.kill(signal);
pids.sort().forEach((pid) => exec(`kill -${sig} ${pid}`, noop));
waitForSubProcesses(child.pid, () => {
// finally kill the main user process
exec(`kill -${sig} ${child.pid}`, callback);
});
});
(only linux part for simplicity, same logic applies to Windows path)
The comments in the code literally say that - "we use psTree to kill the full subtree of nodemon".
What I'm looking for is having a flag that will prevent this behavior and instead only "kill" the parent - letting it do the cleanup for any potential children.

If you agree with that, I would be willing to make a PR that implements this behavior.

remy commented

Okay, so, the code isn't really doing what the author expects: if I trap the signal, I don't expect nodemon to signal my subprocesses.

I'm not keen on adding a flag to solve this (see design principles), but this may be the only way to solve the issue.

The problem I see is that nodemon has no way to know if the signal was trapped, or more importantly: not. Because if the signal is not trapped, then nodemon needs to kill the full process tree.

The only way I've seen (and I'm not even sure it's reliable) is to listen for the exit event on the spawned process and check the exit code matches the signal - except this only happens once the process has exited and that's a whole big problem that nodemon and contributors filing issues have been trying to get right for years.

Do you have thoughts on how it can be archived without a flag?

In the optimal world it would actually be the other way around than it is currently - the behavior of nodemon should be to only kill the parent - in that case any child of that parent should request a SIGHUP signal when its parent dies and exit by itself - the responsibility should be on the child.
The whole problem is, that some programs don't do this (it actually is moderately hard to do it correctly without any race conditions) and the current workaround might be needed in specific cases.
The current behavior basically prefers doing the "non-ideal" thing, thinking all processes will not handle this correctly.

One possible solution could be getting a subprocess tree of the main process, then killing only the main process, waiting a short timeout, and if any child is still alive then force-killing those (because by the time parent exits it should have cleaned up the children already) - no matter the signal/code the parent exited with.
There are however other problems with this approach - namely race conditions. If the parent starts a new child process between the resolution of the process tree and the kill (no matter if before or after the signal is sent/delivered), then these new processes won't be killed, because by that point there is no way to find them.

I honestly think that a flag is the best solution here - what you are talking about (detecting if process does handle the signal) is in a way also a flag - it is just set by the OS (if you do manage to get it) instead of being set explicitly by the user.
I think having a flag for "I want to handle the signal myself" is better solution than changing this behavior implicitly based on this info. It also goes the most in the design direction of "Let individuals build on top of nodemon".
Also that info might be unreliable and obscure for the user - if there is a framework around the application, then the framework itself might handle the signal without the actual developer handling it - so the resulting behavior would be unexpected for the developer...
And there definitely isn't a way to know if it was the developer who got to handle the signal or if it was some framework.

What I mean is: Node.js for example by itself always handles the signal - it is just matter that if there is no JS callback for it, then it proceeds to shut down by itself. From the view of the OS it still however is a handled signal not resulting in directly killed process.

Yet another possible solution would be going closer to the containerization frameworks (e.g. how Docker is able to reliably stop the whole container no matter what processes and when are started inside it).
On linux this could be achieved by becoming a child subreaper (see prctl(2), PR_SET_CHILD_SUBREAPER) - which would cause any orphaned children to be re-parented to this process, so it could then collect them and "reap" them when desired (instead of the orphans being re-parented to 1/init and being undiscoverable after that by nodemon).
While this solution would solve this issue, it is very platform dependent (I don't have nearly enough Windows knowledge to say how one could approach this there) and it would almost definitely require having a native binary to do this.

Yet another possible solution would be to literally use a container to run the development server(s) and their children - it is a technology that already exists and is very well tested. I however think, that that goes way past of what Nodemon is intended to do...

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

[not stale]

Hi @remy! Based on my above answers, do you agree that the simplest course of action that does solve the issue is adding a new flag? Would you accept PRs implementing this?

remy commented

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

Automatically closing this issue due to lack of activity