Improve Safe Compression/Decompression Performance with Java9 Features
Tim-Brooks opened this issue · 1 comments
In Elasticsearch we have recently introduced lz4 as a compression option. We run with the SecurityManger
enabled and generally disallow unsafe usage. So our usage of lz4-java is the safe variety. After exploring, we have determined that adding the usage of Arrays#mismatch
and VarHandles
significantly improve the performance of compression and decompression. I introduced a temporarily forked version lz4-java which uses these methods here.
My JMH benchmarks of compressing and decompressing highly compressible JSON data on m5d.4xlarge;
Benchmark Mode Cnt Score Error Units
MyBenchmark.testCompressLZ4Java thrpt 15 536.109 ± 1.879 ops/s
MyBenchmark.testCompressLZ4JavaForked thrpt 15 907.001 ± 1.124 ops/s
MyBenchmark.testCompressLZ4JavaUnsafe thrpt 15 1096.882 ± 39.805 ops/s
MyBenchmark.testDecompressLZ4Java thrpt 15 1513.808 ± 5.423 ops/s
MyBenchmark.testDecompressLZ4JavaForked thrpt 15 3073.232 ± 47.876 ops/s
MyBenchmark.testDecompressLZ4JavaUnsafe thrpt 15 2508.444 ± 83.889 ops/s
I did not benchmark checksumming as it is not relevant to our use case, but using VarHandles
will also improve performance presumably. As an FYI, I think the unsafe decompression is currently hurt by this issue here.
Obviously lz4-java must support Java8 (and currently supports Java7). I explored MethodHandles as a way to access the Java9 methods. I have opened a POC PR against a version of Elasticsearch which still supports Java8. If you look at the forked SafeUtils
and LZ4SafeUtils
classes you can see how MethodHandles
can be used to call VarHandles
when they are available. I benchmarked these changes and they have similar performance as direct Java9 usage.
Benchmark Mode Cnt Score Error Units
MyBenchmark.testCompressLZ4Java thrpt 15 540.005 ± 6.667 ops/s
MyBenchmark.testCompressLZ4JavaForked thrpt 15 1014.770 ± 3.755 ops/s
MyBenchmark.testCompressLZ4JavaForkedMH thrpt 15 1019.950 ± 14.563 ops/s
MyBenchmark.testCompressLZ4JavaForkedMHFallback thrpt 15 532.644 ± 2.028 ops/s
MyBenchmark.testDecompressLZ4Java thrpt 15 1518.632 ± 28.043 ops/s
MyBenchmark.testDecompressLZ4JavaForked thrpt 15 3106.815 ± 82.369 ops/s
MyBenchmark.testDecompressLZ4JavaForkedMH thrpt 15 3133.160 ± 10.002 ops/s
MyBenchmark.testDecompressLZ4JavaForkedMHFallback thrpt 15 1407.144 ± 9.146 ops/s
The fallback benchmarks are the performance with the MethodHandle
approach when the VarHandles
are not available in the JDK. The decompress fallback version had a minor performance degradation, but I think that can be resolved with a slightly different approach (which I would use in a PR to lz4-java).
I am raising this issue to see if there is an interest in me contributing the MethodHandle
variant back. In order to support Java9 lz4-java could also use multi version release JARs. I prototyped this, but the bnd library that lz4-java uses to produce OSGI compatible bundles does not support the multi release format.
Based on that (and my limited knowledge of OSGI) it seems like MethodHandles
are the best way forward. Let me know if you are interested. I can contribute the changes I have put together in Elasticsearch. Additionally, I will expand them to make sure they apply to all components (compression, decompression, and checksumming).
This is related to #151.