It turns out that there’s been a change in Java 15, referenced in this r/clojure reddit thread that disables a relatively ancient feature called BiasedLocking. User pron98 mentioned the possibility that the JVM Clojure implementation may be susceptible to the deprecation of this feaure and subsequently incur a performance hit. Perhaps this would necessitate changes to data structure implementations if the hit is bad enough. The conversation focused on lazy sequences as a likely indicator of performance regression.
Rather than speculate, we can go measure. To wit, I put together this small project that can be run as an uberjar. The core measurement is just reducing over a lazy sequence, arbitarily chosen to be 10^5 entries:
(defn seq-range [n]
(take n (iterate inc 0)))
(defn seq-smash []
(reduce + (seq-range 100000)))
u/alexdmiller pointed out that the original benchmark focused on reduceing over
a seq
coerced range, but this was faulty since LongRange implements seq
directly and returns to fast internal reduction path, with no lazy sequences.
To get around this, we generate an equivalent lazy sequence using
take
and iterate
.
user=> (type (take 2 (iterate inc 0)))
clojure.lang.LazySeq
The eager reduction of the lazy sequence will hopefully provide a useful indicator of possible performance regression, if lazy sequences alone are a prime target for regression.
User bsless pointed out that there are likely performance regressions between
g1gc (the default gc after Java 9) and Java 8’s parallel gc default. Being
primarily a Java 8 target, I did not think of this originally, so I have
included UseParallelGC
as a factor in the experimental design. So we will
explore performance on my legacy JDK 8, and the most current JDK 15, along with
command line directives to enable biased locking and parallel gc.
The program uses criterium
to generate sample data and estimate performance,
including warmup time for the JIT. Upon invocation as an uberjar, the VM
properties of java.runtime.version
and java.vendor
are prepended to the
reported benchmark results.
Since I plan to test on Java 8, the uberjar intentionally uses a minor shim class for its `-main` entry point to minimize the actual amount of AOT’d bytecode, which should amount to a very portable implementation (in case of bytecode differences between VM versions). Thus, performance measures will also include Clojure loading and evaluating the namespace containing the actual benchmark function, and crtierium.
These are the results of running the sample program via the invocations presented.
OS | java.runtime.version | java.vendor | args | Execution Time Mean |
---|---|---|---|---|
Windows 10 | 1.8.0_222-b10 | AdoptOpenJDK | 8.217622 ms | |
Windows 10 | 15+36-1562 | Oracle Corporation | 15.559892 ms | |
Windows 10 | 15+36-1562 | Oracle Corporation | -XX:+UseBiasedLocking | 15.606927 ms |
Windows 10 | 15+36-1562 | Oracle Corporation | -XX:+UseParallelGC | 8.548820 ms |
Windows 10 | 15+36-1562 | Oracle Corporation | -XX:+UseBiasedLocking -XX:+UseParallelGC | 8.619192 ms |
Despite a different benchmark, the overall observations are the same.
ParallelGC seems to be far more important than biased locking here. When we compare Java 8 (ParallelGC by default) with Java15 using ParallelGC, the results are in the same magnitude, but favor Java 15 (likely by virtue of the benefit of modern optimizations accrued by the newer VM). Using the parallel gc seems to be about ~2x (1.895) faster here (interesting). I am not versed in GC operations enough to understand this seemingly clear 2x result.
In the pure JDK 15 comparison, using the default G1GC without BiasedLocking is
about %5 slower 0.3% faster in this microbenchmark. There appears to be
a minor gain in the new JDK without biased locking, although it’s still slower
than the original Java 8 path with seemingly commensurate options.
For this microbench, perhaps the deprecation of BiasedLocking is far less
impactful for the immediate future of Clojure performance when consuming lazy
sequences than the choice of GC is. I leave it to the reader to determine if an
apparent ~5% hit 0.3% gain matters. It’s unclear how well this holds in the
aggregate, such as a complex application. Real world measures would be highly
informative.
1.8.0_222-b10
AdoptOpenJDK
Evaluation count : 78 in 6 samples of 13 calls.
Execution time mean : 8.217622 ms
Execution time std-deviation : 115.494655 ╡s
Execution time lower quantile : 8.097967 ms ( 2.5%)
Execution time upper quantile : 8.386987 ms (97.5%)
Overhead used : 2.073030 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
java -jar test.jar
15+36-1562
Oracle Corporation
Evaluation count : 42 in 6 samples of 7 calls.
Execution time mean : 15.559892 ms
Execution time std-deviation : 157.433014 ╡s
Execution time lower quantile : 15.368435 ms ( 2.5%)
Execution time upper quantile : 15.732033 ms (97.5%)
Overhead used : 8.248577 ns
java -XX:+UseBiasedLocking -jar test.jar
OpenJDK 64-Bit Server VM warning: Option UseBiasedLocking was deprecated in version 15.0 and will likely be removed in a future release.
15+36-1562
Oracle Corporation
Evaluation count : 42 in 6 samples of 7 calls.
Execution time mean : 15.606927 ms
Execution time std-deviation : 704.972726 ╡s
Execution time lower quantile : 15.079177 ms ( 2.5%)
Execution time upper quantile : 16.798572 ms (97.5%)
Overhead used : 8.333356 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
java -XX:+UseParallelGC -jar test.jar
15+36-1562
Oracle Corporation
Evaluation count : 72 in 6 samples of 12 calls.
Execution time mean : 8.548820 ms
Execution time std-deviation : 90.448747 ╡s
Execution time lower quantile : 8.439539 ms ( 2.5%)
Execution time upper quantile : 8.663734 ms (97.5%)
Overhead used : 2.367649 ns
java -XX:+UseBiasedLocking -XX:+UseParallelGC -jar test.jar
OpenJDK 64-Bit Server VM warning: Option UseBiasedLocking was deprecated in version 15.0 and will likely be removed in a future release.
15+36-1562
Oracle Corporation
Evaluation count : 72 in 6 samples of 12 calls.
Execution time mean : 8.619192 ms
Execution time std-deviation : 114.538524 ╡s
Execution time lower quantile : 8.477689 ms ( 2.5%)
Execution time upper quantile : 8.726289 ms (97.5%)
Overhead used : 2.370110 ns