libsdl-org/SDL_image

jpeg-xl test have been failing on macos

sezero opened this issue · 10 comments

For some time, jpeg-xl our test have been failing on macos, as can
be seen in the CI logs.

The CI logs say that brew is installing version 0.10.2 of libjxl.
None of the other runners use v0.10.x at the moment, therefore is
it possible that SDL_image has an issue with libjxl-0.10.x ?

The ci failure mode(s) have also been changed since GitHub macos runners switched to arm64:
Before, there were errors about surface mismatches.
Now, IMG_Init(IMG_INIT_JXL) fails because it cannot find libjxl.0.10.dylib.
There is also an issue with libwebpdemux.2.dylib

Now, IMG_Init(IMG_INIT_JXL) fails because it cannot find libjxl.0.10.dylib. There is also an issue with libwebpdemux.2.dylib

Ouch. How is that happening? Something wrong with brew (e.g. not adding to dyld cache or something)?

Ouch. How is that happening? Something wrong with brew (e.g. not adding to dyld cache or something)?

I don't know how macOS dyld works but it seems like no homebrew library can be loaded at all.
All libraries provided by homebrew cannot be found: libavif, libjxl, and libwebp. (I removed installation of libjpeg, libpng, and libtiff)

The macOS job now uploads CMake logs that hint homebrew installs to /opt/homebrew (e.g. the CMake cache contains libjxl_LIBRARY:FILEPATH=/opt/homebrew/lib/libjxl.dylib).
But the IMG_Init error message is:

Initialization should succeed (Failed loading libjxl.0.10.dylib: dlopen(libjxl.0.10.dylib, 0x0006): tried: 'libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibjxl.0.10.dylib' (no such file), '/Users/runner/work/SDL_image/SDL_image/build/libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/SDL_image/SDL_image/build/libjxl.0.10.dylib' (no such file), '/var/folders/3m/p59k4qdj0f17st0gn2cmj3640000gn/T/setupsdl/66c57facf1111a4ec08d3d1abdf3c87f3062ffbea8d9eb4c9c412e9b5e7f59b7/package/lib/libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/var/folders/3m/p59k4qdj0f17st0gn2cmj3640000gn/T/setupsdl/66c57facf1111a4ec08d3d1abdf3c87f3062ffbea8d9eb4c9c412e9b5e7f59b7/package/lib/libjxl.0.10.dylib' (no such file), '/usr/lib/libjxl.0.10.dylib' (no such file, not in dyld cache), 'libjxl.0.10.dylib' (no such file))

There is no /opt/homebrew/lib path in the error message.

Looks like macos >= 13 has the issue. Changing the runner to macos-12, we get the old surface mismatch

There is no /opt/homebrew/lib path in the error message.

I guess the path is on macos libs path, because cmake can found them, yes?
Is it possible that macos-13 (arm64?) versions somehow quarantine those libs
and dlopen fails because of it?

Looks like CMake hardcodes /opt/homebrew.
This is also the default homebrew installation path.
This homebrew discussion is related.
The shellenv suggestion might work for our purposes.

Yes, looks like we'll need to add homebrew lib directory to dyld path somehow in our workflows

When I add the homebrew library path to DYLD_LIBRARY_PATH, the test fails with a BUS error.
https://github.com/madebr/SDL_image/actions/runs/8927177439/job/24519945832#step:13:152

It fails during the BMP test, after successfully completing the avif test.
Or at least, that is what it appears like because the logs might be incomplete.
When doing a search for "bus error DYLD_LIBRARY_PATH`, it looks like these errors are not uncommon.

Adding /opt/homebrew/lib to SDL_image's rpath won't fix the issue: it must be added to SDL3 (dlopen happens there)

When I add the homebrew library path to DYLD_LIBRARY_PATH, the test fails with a BUS error. https://github.com/madebr/SDL_image/actions/runs/8927177439/job/24519945832#step:13:152

It fails during the BMP test, after successfully completing the avif test. Or at least, that is what it appears like because the logs might be incomplete. When doing a search for "bus error DYLD_LIBRARY_PATH`, it looks like these errors are not uncommon.

Misalinged stack or something? I wonder whether or not it happens in SDL2 too.

Looks like msys started installing libjxl 0.10.2 and our tests started failing there too: https://github.com/libsdl-org/SDL_image/actions/runs/9005679096/job/24741542312

Can we not raise this issue in libjxl bug tracker somehow?