java-native-benchmark

JMH performance benchmark for Java's native call APIs: JNI (via JavaCpp ), JNA, JNR, Bridj and Project Panama Foreign Memory Access and Foreign Linker APIs.

Original README last updated April 9, 2021 CosmicDan updates 2023-03-03

Fork changes

This fork has been updated and modified. The changes since original work are as follows:

Updated to Java 19 (new requirement);
Panama tests completely rewritten;
Updated JavaCpp, JNA and JNR to latest releases (as of writing);
Removed "NoCall" benchmarks, not as useful as new benchmarks
Renamed existing benchmarks to JmhGetSystemTimeSeconds_*_alloc and _preAlloc, see "Types of benchmarks" below for details
Optimized existing tests where available
Added some new benchmarks

Benchmark operation

Types of benchmarks

Each benchmarked function may consist of either/any/all of the following:

_alloc: Native calls will (de)allocate memory every time it is called. JVM calls will consist of object creation. alloc might be useful for judging "one shot" method calls.
_preAlloc: Native calls use a pre-allocated memory space. JVM calls have zero object creation. preAlloc might be useful for judging function calls that you can afford to reserve permanent space in memory for (or do your own memory management).

TODO: More readme updates for new stuff (much of below is out of date)

GetSystemTime(out)void function

Get seconds from the current system time using native call to Windows API function GetSystemTime provided by kernel32.dll:

void GetSystemTime(LPSYSTEMTIME lpSystemTime);

with the data structure defined as

typedef struct _SYSTEMTIME {
  WORD wYear;
  WORD wMonth;
  WORD wDayOfWeek;
  WORD wDay;
  WORD wHour;
  WORD wMinute;
  WORD wSecond;
  WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME, *LPSYSTEMTIME;

Each implementation will

allocate memory for the SYSTEMTIME struct
call native method GetSystemTime passing the allocated memory
extract and return the value from the field wSecond

~~In a separate benchmark I measured performance of the native call only (item 2).~~ … this information is no longer relevant. Refer to "Types of Benchmarks" above.

JNI
JNI is a Java's standard way to call native code present in JDK since its early versions. JNI requires building a native stub as an adapter between Java and native library, so is considered low-level. Helper tools have been developed in order to automate and simplify native stub generation. Here I used JavaCpp, the project is known for prebaking Java wrappers around high-performant C/C++ libraries such as OpenCV and ffmpeg. JavaCpp comes with ready-to-use wrappers for widely used system libraries, including Windows API lib, so I used them in this benchmark.

JNA
JNA resolves the burden of writing native wrapper by using a native stub that calls the target function dynamically. It only requires writing Java code and provides mapping to C structs and unions, however, for complex libraries writing Java API that matched a native lib's C API still might be a big task. JNA also provides prebaked Java classes for Windows API. Wrapping the calls dynamically results in high performance overhead comparing to JNI.

JNA Direct
JNA's direct mode claims to "improve performance substantially, approaching that of custom JNI". That should be well seen then calls are using mostly primitive types for arguments and return values.

BriJ
Bridj is an attempt to provide a Java to Cpp interop solution similar to JNA (without a need of writing and compiling native code), it claims to provide better performance using dyncall and hand-optimized assembly tweaks. A tool named JNAerator helps to generate java classed from the native library headers. The Bridj projects seems to be abandoned now.

JNR
JNR is a comparingly young project that target the same problem. Similarly as JNA or Bridj it does not require native programming. There's not much documentation or reviews at the moment, but JNR is often called promising.

Project Panama
Project Panama aims to simplify the existing complexity with Java to C interop on JDK level. It is still under development, but Foreign Memory Access and Foreign Linker APIs are already available is openjdk 16.

Pure Java
For comparison, the same problem was implemented with JDK's java.util.Date, java.util.Calendar and java.time.LocalDateTime

How to run

Make sure that gradle is configured (e.g. PATH and/or JAVA_HOME) with a JDK 19 (or later) and run

gradlew clean jmh

Results

System:

Intel Core i5-6500 @ 3.20 GHz / Windows 10 / openjdk-16

Full benchmark (average time, smaller is better)

JmhGetSystemTimeSeconds.jnaDirect           2962.544 ± 191.795  ns/op
JmhGetSystemTimeSeconds.jna                 2889.632 ± 173.064  ns/op
JmhGetSystemTimeSeconds.bridj                937.159 ±  59.353  ns/op
JmhGetSystemTimeSeconds.jnr                  362.979 ±   3.560  ns/op
JmhGetSystemTimeSeconds.panama               242.100 ±   2.240  ns/op
JmhGetSystemTimeSeconds.jni_javacpp          216.767 ±   2.239  ns/op
JmhGetSystemTimeSeconds.java_calendar        173.949 ±   3.707  ns/op
JmhGetSystemTimeSeconds.java_localdatetime    70.926 ±   0.670  ns/op
JmhGetSystemTimeSeconds.java_date             63.818 ±   2.434  ns/op

JNA looks expectedly slow (x13 slower that JNI). JNA direct appears even slower, as probably mapping the struct from C to Java consumes the most of operation's time.

Trending JNR appears faster than outdated Bridj, yet staying behind JNI.

Panama APIs demonstrate performance comparable to that of JNI. This looks promising as Oracle's further development of Project Panama is based on these APIs.

JNI itself is still noticeably slower than pure Java. Note that the fastest API was java.util.Date (with a deprecated but still working Date.getSeconds). The JDK8's LocalDateTime is ~2.4 times faster than Calendar API, but yet a little slower than the old-style j.u.Date.

Now let's look into performance of the native call only, stripping out the struct allocation and field access:

Native call only (average time, smaller is better)

JmhCallOnly.jna                             1074.267 ±   8.909  ns/op
JmhCallOnly.jna_direct                      1146.169 ±  23.575  ns/op
JmhCallOnly.bridj                            307.207 ±   6.025  ns/op
JmhCallOnly.jnr                              256.508 ±   3.558  ns/op
JmhCallOnly.jni_javacpp                       44.727 ±   0.255  ns/op
JmhCallOnly.panama                            44.323 ±   0.709  ns/op

The order is nearly the same, leaving JNI and Panama the fastest.

cosmicdan/java-native-benchmark