dbmdz/imageio-jnr

Crash on some VM Linux systems

Closed this issue ยท 12 comments

On some combinations of Linux and Java VM we have a crash of the Java VM with imageio-openjpeg. For example on Ubuntu 21 and Java 11 or Alpine/Docker with Java 17. Any idea for the cause?

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.kenai.jffi.Foreign.invokeN3O1(JJJJJLjava/lang/Object;III)J+0
j  com.kenai.jffi.Invoker.invokeN3(Lcom/kenai/jffi/CallContext;JJJJILjava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;)J+126
j  de.digitalcollections.openjpeg.lib.libopenjp2$jnr$ffi$0.opj_read_header(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;Ljnr/ffi/byref/PointerByReference;)Z+190
j  de.digitalcollections.openjpeg.OpenJpeg.getImage(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/lib/structs/opj_image;+32
j  de.digitalcollections.openjpeg.OpenJpeg.getInfo(Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/Info;+13
j  de.digitalcollections.openjpeg.OpenJpeg.getInfo(Lde/digitalcollections/openjpeg/InStreamWrapper;)Lde/digitalcollections/openjpeg/Info;+5
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.getInfo()Lde/digitalcollections/openjpeg/Info;+16
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.checkIndex(I)V+2
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.read(ILjavax/imageio/ImageReadParam;)Ljava/awt/image/BufferedImage;+2
j  javax.imageio.ImageIO.read(Ljavax/imageio/stream/ImageInputStream;)Ljava/awt/image/BufferedImage;+56 java.desktop@17.0.2
j  javax.imageio.ImageIO.read(Ljava/io/InputStream;)Ljava/awt/image/BufferedImage;+35 java.desktop@17.0.2
j  com.inet.jpeg2000.Jpeg2000ServerPlugin.a(Lcom/inet/plugin/ServerPluginManager;)V+366

hs_err_pid165.log

Can you get your hands on a core dump and extract a backtrace for the parts beyond the FFI where the crash happens?

No, there is no apport tool in the docker container. That we does have no core dump yet.

Brilliant, thank you, I'll investigate!

OK, so I could not reproduce the crash on my machine (Debian unstable, so pretty close to Ubuntu 21, OpenJDK 17.0.2 2022-01-18, x86_64). I also tried with OpenJDK 11.0.14 2022-01-18, also no crash.
Can you provide more details on your two environments, or maybe even a minimal Dockerfile for either of the two in the reproduction repo?

gamma commented

You can use the stock eclipse-temurin:17-sdk-alpine image to reproduce the issue.

gamma commented

We also added an issue with the JDK here: adoptium/adoptium-support#477

Maybe there is additional valuable information for you.

So I was able to get a bit further with this:

Here's the backtrace from gdb that shows the genesis:
image

Maybe this has something to with the musl libc? Can you provide more information on your Ubuntu setup so I can try to reproduce it there?

gamma commented

Thanks for the info so far. We'll check that tomorrow and get back to you. The binary libopenjp was not specifically linked with musl afaik.

The original Linux one works with the adoptopenjdk 12 alpine stock image afaik. I already thought about that - and we tried with a custom glibc build which did not work. I can check for compiling libopenjp2 with musl tomorrow as well (or maybe there is one in the package repos for alpine)

Bingo, I just ran it inside the Docker container mentioned above with the libopenjp2 from the Alpine repository that was linked specifically against libmusl and the test runs without a problem.
I think the issue is that the libopenjp2 was built against glibc and then relocated to musl, ldd prints some warnings as well:

/app # ldd /app/openjpeg/linux/libopenjp2.so.7
        /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libm.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libpthread.so.0 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __vsnprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __pow_finite: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __fprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __sprintf_chk: symbol not found

---- edit:

Yep, that seems to be the case:

/app # nm -D openjpeg/linux/libopenjp2.so.7 |grep GLIBC
                 w __cxa_finalize@GLIBC_2.2.5
                 U __fprintf_chk@GLIBC_2.3.4
                 U __pow_finite@GLIBC_2.15
                 U __sprintf_chk@GLIBC_2.3.4
                 U __stack_chk_fail@GLIBC_2.4
                 U __vsnprintf_chk@GLIBC_2.3.4
                 U calloc@GLIBC_2.2.5
                 U fclose@GLIBC_2.2.5
                 U fopen@GLIBC_2.2.5
                 U fputc@GLIBC_2.2.5
                 U fread@GLIBC_2.2.5
                 U free@GLIBC_2.2.5
                 U fseeko@GLIBC_2.2.5
                 U ftello@GLIBC_2.2.5
                 U fwrite@GLIBC_2.2.5
                 U getenv@GLIBC_2.2.5
                 U getrusage@GLIBC_2.2.5
                 U malloc@GLIBC_2.2.5
                 U memcpy@GLIBC_2.14
                 U memmove@GLIBC_2.2.5
                 U memset@GLIBC_2.2.5
                 U posix_memalign@GLIBC_2.2.5
                 U pthread_attr_init@GLIBC_2.2.5
                 U pthread_attr_setdetachstate@GLIBC_2.2.5
                 U pthread_cond_destroy@GLIBC_2.3.2
                 U pthread_cond_init@GLIBC_2.3.2
                 U pthread_cond_signal@GLIBC_2.3.2
                 U pthread_cond_wait@GLIBC_2.3.2
                 U pthread_create@GLIBC_2.2.5
                 U pthread_join@GLIBC_2.2.5
                 U pthread_mutex_destroy@GLIBC_2.2.5
                 U pthread_mutex_init@GLIBC_2.2.5
                 U pthread_mutex_lock@GLIBC_2.2.5
                 U pthread_mutex_unlock@GLIBC_2.2.5
                 U realloc@GLIBC_2.2.5
                 U stdout@GLIBC_2.2.5
                 U strcpy@GLIBC_2.2.5
                 U strlen@GLIBC_2.2.5
                 U strtol@GLIBC_2.2.5
                 U sysconf@GLIBC_2.2.5

---- edit:

The plot thickens, recall the ldd warrning about failing to relocate various __*printf_chk symbols? Guess what opj_event_msg calls:

    if ((fmt != 00) && (p_event_mgr != 00)) {
        va_list arg;
        char message[OPJ_MSG_SIZE];
        memset(message, 0, OPJ_MSG_SIZE);
        /* initialize the optional parameter list */
        va_start(arg, fmt);
        /* parse the format string and put the result in 'message' */
        vsnprintf(message, OPJ_MSG_SIZE, fmt, arg);  // ๐Ÿ’ฃ๐Ÿ’ฃ๐Ÿ’ฃ
        /* force zero termination for Windows _vsnprintf() of old MSVC */
        message[OPJ_MSG_SIZE - 1] = '\0';
        /* deinitialize the optional parameter list */
        va_end(arg);

        /* output the message to the user program */
        msg_handler(message, l_data);
    }

https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/event.c#L128-L129

gamma commented

Sweet. Good catch. I did not check the libopenjp2 dependencies - just some others. That effectively means that we have to use a different openjp2 lib (will check that right away) or some obscure way to have glibc present - which is possible afaik.

Yes, I think the easiest way would be to rely on the distro-provided libopenjp2, or if that is not possible/desired, to ship a x86_64-unknown-linux-musl build in your JAR.
I'll close this issue since it's not a problem with the library itself.