/NuProcess

Low-overhead, non-blocking I/O, external Process implementation for Java

Primary LanguageJavaApache License 2.0Apache-2.0

NuProcess

NuProcess is proud to power Facebook's Buck build.  

A low-overhead, non-blocking I/O, external Process execution implementation for Java. It is a replacement for java.lang.ProcessBuilder and java.lang.Process.

Have you ever been annoyed by the fact that whenever you spawn a process in Java you have to create two or three "pumper" threads (for every process) to pull data out of the stdout and stderr pipes and pump data into stdin? If your code starts a lot of processes you can have dozens or hundreds of threads doing nothing but pumping data.

NuProcess uses the JNA library to use platform-specific native APIs to achive non-blocking I/O on the pipes between your Java process and the spawned processes:

  • Linux: uses epoll
  • MacOS X: uses kqueue/kevent
  • Windows: uses IO Completion Ports

Maven

<dependency>
    <groupId>com.zaxxer</groupId>
    <artifactId>nuprocess</artifactId>
    <version>1.1.2</version>
    <scope>compile</scope>
</dependency>

👉 Note: Versions 1.0.0 and above contain breaking API changes from 0.9.7.

It's mostly about the memory

Speed-wise, there is not a significant difference between NuProcess and the standard Java Process class, even when running 500 concurrent processes. On some platforms such as MacOS X or Linux, NuProcess is 20% faster than java.lang.Process for large numbers of processes.

However, when it comes to memory there is a significant difference. The overhead of 500 threads, for example, is quite large compared to the one or few threads employed by NuProcess.

Additionally, on unix-based platforms such as Linux, when creating a new process java.lang.Process uses a fork()/exec() operation. This requires a temporary copy of the Java process (the fork), before the exec is performed. When running tests on Linux, in order to spawn 500 processes required setting the JVM max. memory to 3Gb (-Xmx3g). NuProcess uses a variant of fork() called vfork(), which does not impose this overhead. NuProcess can comfortably spawn 500 processes even when running the JVM with only 128Mb.

Example

Like the Java ProcessBuilder, NuProcess offers NuProcessBuilder, so building a process is fairly simple. Let's make a simple example where we use the Unix "cat" command. When launched with no parameters, cat reads from STDIN and echos the output to STDOUT. We're going to start the cat process, write "Hello world!" to its STDIN, and read the echoed reply from STDOUT and print it. Let's build and start the process.

NuProcessBuilder pb = new NuProcessBuilder(Arrays.asList("/bin/cat"));
ProcessHandler handler = new ProcessHandler();
pb.setProcessListener(handler);
NuProcess process = pb.start();
process.wantWrite();
process.waitFor(0, TimeUnit.SECONDS); // when 0 is used for waitFor() the wait is infinite

You'll notice the ProcessHandler in code above. This is a class you provide which receives callbacks from the process to handle input, output, termination, etc. And notice the wantWrite() call, this expresses that we have something we want to write to the process, so our ProcessHandler will be called back to perform the write. Here's what ProcessHandler looks like for our example:

class ProcessHandler extends NuAbstractProcessHandler {
   private NuProcess nuProcess;

   @Override
   public void onStart(NuProcess nuProcess) {
      this.nuProcess = nuProcess;
   }
   
   @Override
   public boolean onStdinReady(ByteBuffer buffer) {
      buffer.put("Hello world!".getBytes());
      buffer.flip();
      return false; // false means we have nothing else to write at this time
   }

   @Override
   public void onStdout(ByteBuffer buffer, boolean closed) {
      if (!closed) {
         byte[] bytes = new byte[buffer.remaining()];
         // You must update buffer.position() before returning (either implicitly,
         // like this, or explicitly) to indicate how many bytes your handler has consumed.
         buffer.get(bytes);
         System.out.println(new String(bytes));

         // For this example, we're done, so closing STDIN will cause the "cat" process to exit
         nuProcess.closeStdin(true);
      }
   }
}

Synchronous Operation

NuProcess does allow you to perform synchronous writes to the stdin of the spawned process. Even though the writes are synchronous they are non-blocking; meaning the write returns immediately. In this model, you do not use NuProcess.wantWrite() and your onStdinReady() method will not be called. If you extend the NuAbstractProcessHandler you do not need to provide an implementation of onStdinReady(). Use the NuProcess.writeStdin() method to write data to the process. This method will return immediately and writes are queued and occur in order. Read the JavaDoc for the NuProcess.writeStdin() method for cautions and caveats.

In the synchronous model, the above example would look like this:

NuProcessBuilder pb = new NuProcessBuilder(Arrays.asList("/bin/cat"));
ProcessHandler handler = new ProcessHandler();
pb.setProcessListener(handler);
NuProcess process = pb.start();

ByteBuffer buffer = ByteBuffer.wrap("Hello, World!".getBytes());
process.writeStdin(buffer);

process.waitFor(0, TimeUnit.SECONDS); // when 0 is used for waitFor() the wait is infinite

And the handler:

class ProcessHandler extends NuAbstractProcessHandler {
   private NuProcess nuProcess;

   @Override
   public void onStart(NuProcess nuProcess) {
      this.nuProcess = nuProcess;
   }
   
   public void onStdout(ByteBuffer buffer, boolean closed) {
      if (!closed) {
         byte[] bytes = new byte[buffer.remaining()];
         // You must update buffer.position() before returning (either implicitly,
         // like this, or explicitly) to indicate how many bytes your handler has consumed.
         buffer.get(bytes);
         System.out.println(new String(bytes));

         // For this example, we're done, so closing STDIN will cause the "cat" process to exit
         nuProcess.closeStdin(true);
      }
   }
}

JavaDocs

You can read the JavaDoc here. Make sure you read and fully understand the JavaDoc for the NuProcessHandler interface as it is your primary contract with NuProcess.

Settings

These are settings that can be defined as System properties that control various behaviors of the NuProcess library. You typically do not need to modify these.

com.zaxxer.nuprocess.threads

This setting controls how many threads are used to handle the STDIN, STDOUT, STDERR streams of spawned processes. No matter how many processes are spawned, this setting will be the maximum number of threads used. Possible values are:

  • auto (default) - this sets the maximum number of threads to the number of CPU cores divided by 2.
  • cores - this sets the maximum number of threads to the number of CPU cores.
  • <number> - the sets the maximum number of threads to a specific number. Often 1 will provide good performance even for dozens of processes.

The default is auto, but in reality if your child processes are "bursty" in their output, rather than producing a constant stream of data, a single thread may provide equivalent performance even with hundreds of processes.

com.zaxxer.nuprocess.softExitDetection

On Linux and Windows there is no method by which you can be notified in an asynchronous manner that a child process has exited. Rather than polling all child processes constantly NuProcess uses what we call "Soft Exit Detection". When a child process exits, the OS automatically closes all of it's open file handles; which is something about which we can be notified. So, on Linux and Windows when NuProcess determines that both the STDOUT and STDERR streams have been closed in the child process, that child process is put into a "dead pool". The processes in the dead pool are polled to determine when they have truly exited and what their exit status was. See com.zaxxer.nuprocess.deadPoolPollMs

The default value for this property is true. Setting this value to false will completely disable process exit detection, and the ``NuProcess.waitFor()" API MUST be used. Failure to invoke this API on Linux will result in an ever-growing accumulation of "zombie" processes and eventually an inability to create new processes. There is very little reason to disable soft exit detection unless you have child process that itself closes the STDOUT and STDERR streams.

com.zaxxer.nuprocess.deadPoolPollMs

On Linux and Windows, when Soft Exit Detection is enabled (the default), this property controls how often the processes in the dead pool are polled for their exit status. The default value is 250ms, and the minimum value is 100ms.

com.zaxxer.nuprocess.lingerTimeMs

This property controls how long the processing thread(s) remains after the last executing child process has exited. In order to avoid the overhead of starting up another processing thread, if processes are frequently run it may be desirable for the processing thread to remain (linger) for some amount of time (default 2500ms).

Related Projects

Charles Duffy has developed a Clojure wrapper library here. Julien Viet has developed a Vert.x 3 library here.

Limitations

The following limitations exist in NuProcess:

  • Currently only supports Linux, Windows, and MacOS X.
  • Java 7 and above
  • Linux support requires at least kernel version 2.6.17 or higher (kernels after June 2006)