jnr/jnr-process

After ProcessBuilder executes the command, the performance problem when reading the command execution result

young-yangyong opened this issue · 3 comments

        ProcessBuilder processBuilder = new ProcessBuilder("ls");
        Process process = processBuilder.start();

        process.waitFor();
        //The result of the ls command execution is poor 1000000000 bytes of math to store, and I published most of it in prossion.getInputStream() when debugging. All values are assigned after reading (bytes), even if the actual content length does not exceed 100, in bytes, the garbage value of the index after that is listed, the same environment length, use Runtime.getRuntime().exec( ) is not the case, the speed is very long and fast
        byte[] bytes = new byte[1000000000];
        long l = System.currentTimeMillis();
        int len = process.getInputStream().read(bytes);
        System.out.println(System.currentTimeMillis() - l);
        System.out.println(new String(bytes, 0, len));

I got the following output running this against the jnr-process directory:

762
LICENSE
META-INF
README.md
jnr-process.iml
pom.xml
src
target

I assume the slow read time you are referring to is the 700-something milliseconds?

My guess would be that this is due to jnr-enxio eagerly allocating a 1GB native buffer for this read, which is partially populated and then copied to the heap array.

The allocation of that native buffer might be difficult to avoid, since you have requested to read up to 1GB in a single read operation.

If the copy is attempting to copy all 1GB of data to the heap array, that could possibly be improved, but I have not confirmed this is the case.

This modified version of your code does the read quickly (<20ms) by allocating the native 1GB buffer in advance. Of course, the 1GB buffer still requires time to allocate (and zero out, I think):

ProcessBuilder processBuilder = new ProcessBuilder("ls");
Process process = processBuilder.start();
process.waitFor();
ByteBuffer bytes = ByteBuffer.allocateDirect(1000000000);
long l = System.currentTimeMillis();
((ReadableByteChannel) process.getIn()).read(bytes);
bytes.flip();
System.out.println(System.currentTimeMillis() - l);
System.out.println(StandardCharsets.UTF_8.decode(bytes));

The difference in performance from Runtime.getRuntime().exec() is because Runtime does the read in small pieces into a separate buffer, rather than allocating one large array. jnr-process is intended to give you direct access to the streams of the child process, so it is up to you to handle iteratively reading data. If you ask us to read 1GB of data, we will allocate a 1GB buffer and try to perform a single read.

Does this make sense?

A simpler explanation:

Your code requests a read of 1GB of data. In order to perform a single read, we must allocate 1GB of native memory in which to store that data.

The logic in Runtime.getRuntime().exec() handles child processes differently, draining their stdout streams in small chunks and buffering that data on the heap. This reduces the amount of native memory required, but does not allow you to control how many read operations happen, nor give you access to select and other direct NIO operations.

If you want the same performance as runtime.exec() you should use a smaller buffer (either native or on the JVM heap) and read all data in several pieces, to avoid doing a single read and allocating a large native buffer.

I understand. Thank you for your reply!