brettwooldridge/NuProcess

Signal handling for epoll_wait can leak processes

bturner opened this issue · 3 comments

Unexpected signal delivery to the JVM while NuProcess's ProcessEpoll is blocked in epoll_wait can result in it leaking zombie processes.

Per signal(7):

       The following interfaces are never restarted after being interrupted by
       a signal handler, regardless of the use of SA_RESTART; they always fail
       with the error EINTR when interrupted by a signal handler:

       * File     descriptor     multiplexing    interfaces:    epoll_wait(2),
         epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).

This means even if the JVM registers signal handlers with SA_RESTART, epoll_wait will not be restarted.

Looking through JDK's native code at places where it uses epoll_wait, it wraps them in a RESTARTABLE macro (defined here) which detects a -1 return paired with EINTR and re-runs epoll_wait. ProcessEpoll should have something similar.

Debugging this leak was quite challenging due to NuProcess's (lack of) error handling.

I ended up expanding the exception message and adding some logging to finally get to the bottom of it.

If the stdout or stderr buffers fill up, NuProcess also ends up leaking those as zombies due to this check (which triggers the same log-less abort in BaseEventProcessor).

There's documentation that those buffers shouldn't be allowed to fill, and I've got code that prevents it--except in one case: Where I'm in an onStdout (or onStderr) callback and I call NuProcess.destroy. In that case, I (wrongly) assumed the state of the buffer wouldn't matter, but it still does. I've since updated my handlers to purge the buffers after calling destroy to prevent zombies.

@bturner Pull request welcome. 😉

Always, sir. I'll have one up briefly