Signal handling for epoll_wait can leak processes
bturner opened this issue · 3 comments
Unexpected signal delivery to the JVM while NuProcess's ProcessEpoll
is blocked in epoll_wait
can result in it leaking zombie processes.
Per signal(7)
:
The following interfaces are never restarted after being interrupted by
a signal handler, regardless of the use of SA_RESTART; they always fail
with the error EINTR when interrupted by a signal handler:
* File descriptor multiplexing interfaces: epoll_wait(2),
epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).
This means even if the JVM registers signal handlers with SA_RESTART
, epoll_wait
will not be restarted.
Looking through JDK's native code at places where it uses epoll_wait
, it wraps them in a RESTARTABLE
macro (defined here) which detects a -1
return paired with EINTR
and re-runs epoll_wait
. ProcessEpoll
should have something similar.
Debugging this leak was quite challenging due to NuProcess's (lack of) error handling.
ProcessEpoll
throws aRuntimeException
with a minimal message and noerrno
to indicate what the error was, but that doesn't really matter becauseBaseEventProcessor.run
catches the exception and aborts without any logging
I ended up expanding the exception message and adding some logging to finally get to the bottom of it.
If the stdout
or stderr
buffers fill up, NuProcess also ends up leaking those as zombies due to this check (which triggers the same log-less abort in BaseEventProcessor
).
There's documentation that those buffers shouldn't be allowed to fill, and I've got code that prevents it--except in one case: Where I'm in an onStdout
(or onStderr
) callback and I call NuProcess.destroy
. In that case, I (wrongly) assumed the state of the buffer wouldn't matter, but it still does. I've since updated my handlers to purge the buffers after calling destroy
to prevent zombies.
@bturner Pull request welcome. 😉
Always, sir. I'll have one up briefly