Crash on some VM Linux systems
Closed this issue ยท 12 comments
On some combinations of Linux and Java VM we have a crash of the Java VM with imageio-openjpeg. For example on Ubuntu 21 and Java 11 or Alpine/Docker with Java 17. Any idea for the cause?
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j com.kenai.jffi.Foreign.invokeN3O1(JJJJJLjava/lang/Object;III)J+0
j com.kenai.jffi.Invoker.invokeN3(Lcom/kenai/jffi/CallContext;JJJJILjava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;)J+126
j de.digitalcollections.openjpeg.lib.libopenjp2$jnr$ffi$0.opj_read_header(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;Ljnr/ffi/byref/PointerByReference;)Z+190
j de.digitalcollections.openjpeg.OpenJpeg.getImage(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/lib/structs/opj_image;+32
j de.digitalcollections.openjpeg.OpenJpeg.getInfo(Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/Info;+13
j de.digitalcollections.openjpeg.OpenJpeg.getInfo(Lde/digitalcollections/openjpeg/InStreamWrapper;)Lde/digitalcollections/openjpeg/Info;+5
j de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.getInfo()Lde/digitalcollections/openjpeg/Info;+16
j de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.checkIndex(I)V+2
j de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.read(ILjavax/imageio/ImageReadParam;)Ljava/awt/image/BufferedImage;+2
j javax.imageio.ImageIO.read(Ljavax/imageio/stream/ImageInputStream;)Ljava/awt/image/BufferedImage;+56 java.desktop@17.0.2
j javax.imageio.ImageIO.read(Ljava/io/InputStream;)Ljava/awt/image/BufferedImage;+35 java.desktop@17.0.2
j com.inet.jpeg2000.Jpeg2000ServerPlugin.a(Lcom/inet/plugin/ServerPluginManager;)V+366
Can you get your hands on a core dump and extract a backtrace for the parts beyond the FFI where the crash happens?
No, there is no apport tool in the docker container. That we does have no core dump yet.
We have created a sample repository, also containing a sample core dump:
https://github.com/gamma/temurin-jvm-docker-crash-sample/blob/main/core-dump.tar.gz
Brilliant, thank you, I'll investigate!
OK, so I could not reproduce the crash on my machine (Debian unstable, so pretty close to Ubuntu 21, OpenJDK 17.0.2 2022-01-18, x86_64). I also tried with OpenJDK 11.0.14 2022-01-18, also no crash.
Can you provide more details on your two environments, or maybe even a minimal Dockerfile for either of the two in the reproduction repo?
You can use the stock eclipse-temurin:17-sdk-alpine
image to reproduce the issue.
We also added an issue with the JDK here: adoptium/adoptium-support#477
Maybe there is additional valuable information for you.
So I was able to get a bit further with this:
- It's not related to the Adoptium Build, the crash happens just the same with the
openjdk17-jdk
build from the Alpine community repository - The crash happens when invoking a Java-provided callback from beyond the FFI via
opj_event_msg
, these callbacks can be found here: https://github.com/dbmdz/imageio-jnr/blob/main/imageio-openjpeg/src/main/java/de/digitalcollections/openjpeg/OpenJpeg.java#L72-L88
Here's the backtrace from gdb
that shows the genesis:
Maybe this has something to with the musl libc? Can you provide more information on your Ubuntu setup so I can try to reproduce it there?
Thanks for the info so far. We'll check that tomorrow and get back to you. The binary libopenjp was not specifically linked with musl afaik.
The original Linux one works with the adoptopenjdk 12 alpine stock image afaik. I already thought about that - and we tried with a custom glibc build which did not work. I can check for compiling libopenjp2 with musl tomorrow as well (or maybe there is one in the package repos for alpine)
Bingo, I just ran it inside the Docker container mentioned above with the libopenjp2
from the Alpine repository that was linked specifically against libmusl and the test runs without a problem.
I think the issue is that the libopenjp2
was built against glibc and then relocated to musl, ldd
prints some warnings as well:
/app # ldd /app/openjpeg/linux/libopenjp2.so.7
/lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
libm.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
libpthread.so.0 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __vsnprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __pow_finite: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __fprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __sprintf_chk: symbol not found
---- edit:
Yep, that seems to be the case:
/app # nm -D openjpeg/linux/libopenjp2.so.7 |grep GLIBC
w __cxa_finalize@GLIBC_2.2.5
U __fprintf_chk@GLIBC_2.3.4
U __pow_finite@GLIBC_2.15
U __sprintf_chk@GLIBC_2.3.4
U __stack_chk_fail@GLIBC_2.4
U __vsnprintf_chk@GLIBC_2.3.4
U calloc@GLIBC_2.2.5
U fclose@GLIBC_2.2.5
U fopen@GLIBC_2.2.5
U fputc@GLIBC_2.2.5
U fread@GLIBC_2.2.5
U free@GLIBC_2.2.5
U fseeko@GLIBC_2.2.5
U ftello@GLIBC_2.2.5
U fwrite@GLIBC_2.2.5
U getenv@GLIBC_2.2.5
U getrusage@GLIBC_2.2.5
U malloc@GLIBC_2.2.5
U memcpy@GLIBC_2.14
U memmove@GLIBC_2.2.5
U memset@GLIBC_2.2.5
U posix_memalign@GLIBC_2.2.5
U pthread_attr_init@GLIBC_2.2.5
U pthread_attr_setdetachstate@GLIBC_2.2.5
U pthread_cond_destroy@GLIBC_2.3.2
U pthread_cond_init@GLIBC_2.3.2
U pthread_cond_signal@GLIBC_2.3.2
U pthread_cond_wait@GLIBC_2.3.2
U pthread_create@GLIBC_2.2.5
U pthread_join@GLIBC_2.2.5
U pthread_mutex_destroy@GLIBC_2.2.5
U pthread_mutex_init@GLIBC_2.2.5
U pthread_mutex_lock@GLIBC_2.2.5
U pthread_mutex_unlock@GLIBC_2.2.5
U realloc@GLIBC_2.2.5
U stdout@GLIBC_2.2.5
U strcpy@GLIBC_2.2.5
U strlen@GLIBC_2.2.5
U strtol@GLIBC_2.2.5
U sysconf@GLIBC_2.2.5
---- edit:
The plot thickens, recall the ldd
warrning about failing to relocate various __*printf_chk
symbols? Guess what opj_event_msg
calls:
if ((fmt != 00) && (p_event_mgr != 00)) {
va_list arg;
char message[OPJ_MSG_SIZE];
memset(message, 0, OPJ_MSG_SIZE);
/* initialize the optional parameter list */
va_start(arg, fmt);
/* parse the format string and put the result in 'message' */
vsnprintf(message, OPJ_MSG_SIZE, fmt, arg); // ๐ฃ๐ฃ๐ฃ
/* force zero termination for Windows _vsnprintf() of old MSVC */
message[OPJ_MSG_SIZE - 1] = '\0';
/* deinitialize the optional parameter list */
va_end(arg);
/* output the message to the user program */
msg_handler(message, l_data);
}
https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/event.c#L128-L129
Sweet. Good catch. I did not check the libopenjp2
dependencies - just some others. That effectively means that we have to use a different openjp2 lib (will check that right away) or some obscure way to have glibc
present - which is possible afaik.
Yes, I think the easiest way would be to rely on the distro-provided libopenjp2
, or if that is not possible/desired, to ship a x86_64-unknown-linux-musl
build in your JAR.
I'll close this issue since it's not a problem with the library itself.