Descriptors/handles are leaked when processes fail to start
bturner opened this issue · 2 comments
The exact codepaths vary, but for every supported platform there are cases where a failure to start a process leaks descriptors or handles.
For example, on Linux, if forkAndExec
throws an exception, closePipes()
is never called and the 3 *Widow
descriptors are leaked. You can confirm this with NuProcess's own test suite:
- Put a breakpoint in
RunTest.noExecutableFound()
after the call topb.start()
- When the breakpoint is hit, run
lsof
on thejava
process
At the end of the lsof
output will be something like the following (the IDs will likely vary, of course)
java 3858 bturner 24r FIFO 0,12 0t0 119418 pipe
java 3858 bturner 27w FIFO 0,12 0t0 119419 pipe
java 3858 bturner 29w FIFO 0,12 0t0 119420 pipe
If you use the debugger to manually call process.closePipes()
and then run lsof
again, you'll see that the 3 pipes are closed.
That specific codepath is handled correctly on macOS and all the pipes are closed, but if there are any errors in createPosixPipes()
after it calls createPipes()
, since createPosixPipes()
is called outside the try
/catch
in run
and start
, the pipes are leaked. (This case is much less likely that the Linux one, to be clear; I'm just noting its an edge case that exists.)
WindowsProcess
has similar issues with handles it creates in createPipes()
.
I'm hoping to put together a pull request (or pull requests) for these issues in the next couple days.
The Linux issue with leaked pipe descriptors has actually caused an outage for us in production internally, as well as for a fairly large customer. In both cases the system ran out of file descriptors through no direct fault of NuProcess's; Bitbucket Server was simply running too many concurrent processes, but once that happens it's basically impossible for the system to ever recover without a restart because each new failure to start a process leaks 3 more pipe handles. Similar situations with ProcessBuilder
can recover on their own; once the load subsides and the number of concurrent processes drops, things are stable again. With the pipe leaks, even as load declines the number of free descriptors doesn't increase.
Raised pull request #120 for the Linux pipe part of the fix. The issues with macOS and Windows are much more subtle (on Windows I'm actually not certain there are any issues, at this point), and would involve failures of system calls other than posix_spawnp
or CreateProcessW
, so I'm not certain they're worth the complexity of the changes it would take to fix them.