kcat/openal-soft

Crash on specific builds, when OAL is compiled using VS 2022 with Multithread DLL

smallmodel opened this issue · 5 comments

When OpenAL is compiled from source (VS Enterprise 2022, RelWithDebInfo), either from master or 1.23.1, and shipped for Win32 and Win64, it crashes on some rare builds:

  • The user who is having the issue is running Windows 10 LTSC Build 10.0.19044 with Sound BlasterX G5 drivers.
  • The DLL image loads fine
  • All symbols get loaded (GetProcAddress)
  • alcGetString works, but a call to alcIsExtensionPresent or alcOpenDevice makes the process exit, no message is logged even with ALSOFT_LOGLEVEL set to 3, and no crash dump is generated. These functions point to a valid address.

On my end, the compiled DLL works fine.
When the user uses soft_oal.dll from openal-soft releases directly, renamed to OpenAL64.dll, it works and doesn't crash.

After some struggle, here is the analysis and cause:

  • soft_oal.dll in the release page was compiled with MinGW
  • OpenAL64.dll is compiled with Visual Studio 2022, with MultiThreaded DLL (/MD) compiler flag
  • OpenAL64.dll is crashing when using MultiThreaded DLL, but works fine when the CRT is statically linked or when it uses debug runtime library DLL.

The hypothesis is that it crashes because of a conflict with the shared CRT DLL runtime between OAL and the standalone application. When OAL uses the debug CRT DLL, it works as the application is using the release one (so there is no conflict).
I don't have a stack trace and I can't reproduce the problem on my end unfortunately.

The workaround solution is to statically link the CRT runtime when compiling OAL (/MT compiler flag). However it might be better to see why it's crashing when using Multithreaded DLL CRT and eventually fix it

EDIT: The crash also occur for this user when using the soft-oal binaries produced by the GitHub Actions CI (master branch) as they're compiled using VS 2022.

kcat commented

I can't test what it's doing on native Windows, but at least with Wine, I do see the CI release build having a problem loading. When using the router DLL, it tries to load soft_oal.dll, which fails during initialization (with error 0xc0000005, a segfault) and causes the app to silently close. If I rename soft_oal.dll to OpenAL32.dll to use it directly, then I get a null pointer crash. Unfortunately I can't debug it much further, the only backtrace I can get is

Backtrace:
=>0 0x006ffffc273020 in msvcp140 (+0x13020) (0x006ffffc5d5f10)
  1 0x006ffffc5d0ffc in openal32 (+0x40ffc) (0x006ffffc5d5f10)
  2 0x006ffffc5d5f3c in openal32 (+0x45f3c) (0x006ffffc5d5f10)
  3 0x0000014000194d in openal-info (+0x194d) (0x006ffffc5d5f10)
  4 0x00000140002a81 in openal-info (+0x2a81) (0x00000000000061)
  5 0x000001400012f7 in openal-info (+0x12f7) (0x007ffffe895d90)
  6 0x00000140001406 in openal-info (+0x1406) (0000000000000000)
  7 0x006fffffaa91bd BaseThreadInitThunk+0xd(unknown=<internal error>, entry=<internal error>, arg=<
internal error>) [./dlls/kernel32/thread.c:61] in kernel32 (0000000000000000)
  8 0x006fffffca5ddb in ntdll (+0x55ddb) (0000000000000000)

without any function names or line numbers. And I'm not able to find where openal32 made the msvcp140 call in the disassembled openal32.dll. It seems to happen before OpenAL Soft is able to open any log file, let alone write anything.

The debug CI build won't load since I don't have the debug CRT DLLs, but it probably would work anyway since it seems to for you. Making a debug soft_oal.dll that uses the release/non-debug CRT runtime may help narrow it down.

Found the calltrace:

Looks like the exact same crash some users get on native Windows, will try to investigate further on my end

Could you try to build Win32 OAL with _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR compiler definition (I believe that's with -DCMAKE_C_FLAGS="/D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR /EHsc" -DCMAKE_CXX_FLAGS="/D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR /EHsc") on your end?
As seen here: https://stackoverflow.com/questions/78598141/first-stdmutexlock-crashes-in-application-built-with-latest-visual-studio - could be the culprit

kcat commented

_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR is defined now with commit adc4574, and the CI release builds seem to work for me.

That solved it, the issue no longer occurs 👍