kcat/openal-soft

WASAPI backend sounds different than dsound and winmm

sergeyext opened this issue · 7 comments

I hear the difference between backends on my laptop when using headphones. Also tested on my teammates and sounddesigners' machines. We have no special postprocess settings in Windows or the sound device drivers. To get measurable results, I made a basic sandbox application, recorded it's output via virtual loopback device and analyzed it. I switched between backends using alsoft.ini in the current directory.

The first sample is a 'kick' from a drum machine app. The difference is clearly visible on the waveform:
kick-waveforms
Up to down:

  1. Loaded reference sound
  2. Playing with dsound, no issues
  3. Playing with wasapi, amplitude varies

The effect is not related to application start. If I loop the sample in the sandbox, nothing changes. Each kick has the same issue.

And these are spectra comparisons:
kick-cmp-ref-dsound-headphones
kick-cmp-ref-wasapi-headphones
Orange is the reference sound, blue are played sounds, white is the difference. The first image was played with dsound, and the second one with wasapi.

And another example is a 'hat' sample from a musical app. Waveform is not so informative, so spectra comparison only:
hat-cmp-ref-dsound-headphones
hat-cmp-ref-wasapi-headphones
The first one is reference to dsound, and the second one is reference to wasapi.

I tried messing with buffer and sound source properties in the source code and with various settings in alsoft.ini. I changed pcm types, discretization frequencies, resampling properties, hrtf properties and so on. Nothing else affects the output this way.

This is ALSOFT_LOGLEVEL=3 output when playing kick.wav with wasapi:

[ALSOFT] (II) Initializing library v1.22.1-unknown UNKNOWN
[ALSOFT] (II) Supported backends: wasapi, dsound, winmm, null, wave
[ALSOFT] (II) Loading config C:\Users\Sergey\AppData\Roaming\alsoft.ini...
[ALSOFT] (II) Got binary: C:\code\alsoft-backends-test\cmake-build-debug, alsoft_backends_test.exe
[ALSOFT] (II) Loading config C:\code\alsoft-backends-test\cmake-build-debug\alsoft.ini...
[ALSOFT] (II)  found 'drivers' = 'wasapi'
[ALSOFT] (II) Key disable-cpu-exts not found
[ALSOFT] (II) Vendor ID: "AuthenticAMD"
[ALSOFT] (II) Name: "AMD Ryzen 7 4800H with Radeon Graphics"
[ALSOFT] (II) Extensions: +SSE +SSE2 +SSE3 +SSE4.1
[ALSOFT] (II) Key rt-prio not found
[ALSOFT] (II) Key rt-time-limit not found
[ALSOFT] (II) Key game_compat/reverse-x not found
[ALSOFT] (II) Key game_compat/reverse-y not found
[ALSOFT] (II) Key game_compat/reverse-z not found
[ALSOFT] (II) Key resampler not found
[ALSOFT] (II) Key trap-al-error not found
[ALSOFT] (II) Key trap-alc-error not found
[ALSOFT] (II) Key reverb/boost not found
[ALSOFT] (II) Found drivers = "wasapi"
[ALSOFT] (II) Starting message thread
[ALSOFT] (II) Message thread initialization complete
[ALSOFT] (II) Starting message loop
[ALSOFT] (II) Initialized backend "wasapi"
[ALSOFT] (II) Added "wasapi" for playback
[ALSOFT] (II) Added "wasapi" for capture
[ALSOFT] (II) Key excludefx not found
[ALSOFT] (II) Key default-reverb not found
[ALSOFT] (II) Key eax/enable not found
[ALSOFT] (II) Got message "Open Device" (0x0000, this=000001DBC0F8B7C0, param=0000000000000000)
[ALSOFT] (II) Key frequency not found
[ALSOFT] (II) Key sources not found
[ALSOFT] (II) Key slots not found
[ALSOFT] (II) Key sends not found
[ALSOFT] (II) Created device 000001DBC0F9E880, "OpenAL Soft on Headphone (Realtek(R) Audio)"
[ALSOFT] (II) Key sample-type not found
[ALSOFT] (II) Key channels not found
[ALSOFT] (II) Key ambi-format not found
[ALSOFT] (II) Key period_size not found
[ALSOFT] (II) Key periods not found
[ALSOFT] (II) Key hrtf not found
[ALSOFT] (II) Pre-reset: Stereo, Float32, 44100hz, 882 / 2646 buffer
[ALSOFT] (II) Got message "Reset Device" (0x0002, this=000001DBC0F8B7C0, param=0000000000000000)
[ALSOFT] (II) Device mix format:
    FormatTag      = 0xfffe
    Channels       = 2
    SamplesPerSec  = 48000
    AvgBytesPerSec = 384000
    BlockAlign     = 8
    BitsPerSample  = 32
    Size           = 22
    Samples        = 32
    ChannelMask    = 0x3
    SubFormat      = {00000003-0000-0010-8000-00aa00389b71}
[ALSOFT] (II) Requesting playback format:
    FormatTag      = 0xfffe
    Channels       = 2
    SamplesPerSec  = 48000
    AvgBytesPerSec = 384000
    BlockAlign     = 8
    BitsPerSample  = 32
    Size           = 22
    Samples        = 32
    ChannelMask    = 0x3
    SubFormat      = {00000003-0000-0010-8000-00aa00389b71}
[ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 960 / 2880 buffer
[ALSOFT] (II) Key stereo-mode not found
[ALSOFT] (II) Key stereo-encoding not found
[ALSOFT] (II) Key hrtf-paths not found
[ALSOFT] (II) Searching C:\code\alsoft-backends-test\cmake-build-debug\*.mhr
[ALSOFT] (II) Searching C:\Users\Sergey\AppData\Roaming\openal\hrtf\*.mhr
[ALSOFT] (II) Searching C:\ProgramData\openal\hrtf\*.mhr
[ALSOFT] (II) Adding built-in entry "!1_Built-In HRTF"
[ALSOFT] (II) Key default-hrtf not found
[ALSOFT] (II) Loading !1_Built-In HRTF...
[ALSOFT] (II) Detected data set format v3
[ALSOFT] (II) Resampling HRTF Built-In HRTF (44100hz -> 48000hz)
[ALSOFT] (II) Loaded HRTF Built-In HRTF for sample rate 48000hz, 35-sample filter
[ALSOFT] (II) Key hrtf-size not found
[ALSOFT] (II) Key hrtf-mode not found
[ALSOFT] (II) 1st order + Full HRTF rendering enabled, using "Built-In HRTF"
[ALSOFT] (II) Channel config, Main: 4, Real: 2
[ALSOFT] (II) Allocating 6 channels, 24576 bytes
[ALSOFT] (II) Min delay: 8.00, max delay: 32.75, FIR length: 35
[ALSOFT] (II) New max delay: 24.75, FIR length: 60
[ALSOFT] (II) Key decoder/nfc not found
[ALSOFT] (II) Max sources: 256 (255 + 1), effect slots: 64, sends: 4
[ALSOFT] (II) Key dither not found
[ALSOFT] (II) Key dither-depth not found
[ALSOFT] (II) Dithering disabled
[ALSOFT] (II) Key output-limiter not found
[ALSOFT] (II) Output limiter disabled
[ALSOFT] (II) Fixed device latency: 0ns
[ALSOFT] (II) Got message "Start Device" (0x0003, this=000001DBC0F8B7C0, param=0000000000000000)
[ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 960 / 2880 buffer
[ALSOFT] (II) Increasing allocated voices to 256
[ALSOFT] (II) Key volume-adjust not found
[ALSOFT] (II) Created context 000001DBC2C6C4F0
ALC_SOFT_HRTF: 1
[ALSOFT] (II) Key hrtf-paths not found
[ALSOFT] (II) Searching C:\code\alsoft-backends-test\cmake-build-debug\*.mhr
[ALSOFT] (II) Searching C:\Users\Sergey\AppData\Roaming\openal\hrtf\*.mhr
[ALSOFT] (II) Searching C:\ProgramData\openal\hrtf\*.mhr
[ALSOFT] (II) Adding built-in entry "!1_Built-In HRTF"
[ALSOFT] (II) Key default-hrtf not found
HRTFs count: 1
HRTF Name: Built-In HRTF
Use HRTF: 1
HRTF status: 1
[ALSOFT] (II) Increasing allocated voice properties to 32
Playing...
Finished.
[ALSOFT] (II) Got message "Stop Device" (0x0004, this=000001DBC0F8B7C0, param=0000000000000000)
[ALSOFT] (II) Freeing context 000001DBC2C6C4F0
[ALSOFT] (II) Freed 0 context property objects
[ALSOFT] (II) Freed 0 AuxiliaryEffectSlot property objects
[ALSOFT] (II) Freeing device 000001DBC0F9E880
[ALSOFT] (II) Got message "Close Device" (0x0005, this=000001DBC0F8B7C0, param=0000000000000000)
[ALSOFT] (II) HrtfStore 000001DBC2D600C0 decreasing refcount to 0
[ALSOFT] (II) Unloading unused HRTF !1_Built-In HRTF

And this is the same with dsound:

[ALSOFT] (II) Initializing library v1.22.1-unknown UNKNOWN
[ALSOFT] (II) Supported backends: wasapi, dsound, winmm, null, wave
[ALSOFT] (II) Loading config C:\Users\Sergey\AppData\Roaming\alsoft.ini...
[ALSOFT] (II) Got binary: C:\code\alsoft-backends-test\cmake-build-debug, alsoft_backends_test.exe
[ALSOFT] (II) Loading config C:\code\alsoft-backends-test\cmake-build-debug\alsoft.ini...
[ALSOFT] (II)  found 'drivers' = 'dsound'
[ALSOFT] (II) Key disable-cpu-exts not found
[ALSOFT] (II) Vendor ID: "AuthenticAMD"
[ALSOFT] (II) Name: "AMD Ryzen 7 4800H with Radeon Graphics"
[ALSOFT] (II) Extensions: +SSE +SSE2 +SSE3 +SSE4.1
[ALSOFT] (II) Key rt-prio not found
[ALSOFT] (II) Key rt-time-limit not found
[ALSOFT] (II) Key game_compat/reverse-x not found
[ALSOFT] (II) Key game_compat/reverse-y not found
[ALSOFT] (II) Key game_compat/reverse-z not found
[ALSOFT] (II) Key resampler not found
[ALSOFT] (II) Key trap-al-error not found
[ALSOFT] (II) Key trap-alc-error not found
[ALSOFT] (II) Key reverb/boost not found
[ALSOFT] (II) Found drivers = "dsound"
[ALSOFT] (II) Initialized backend "dsound"
[ALSOFT] (II) Added "dsound" for playback
[ALSOFT] (II) Added "dsound" for capture
[ALSOFT] (II) Key excludefx not found
[ALSOFT] (II) Key default-reverb not found
[ALSOFT] (II) Key eax/enable not found
[ALSOFT] (II) Got device "OpenAL Soft on Headphone (Realtek(R) Audio)", GUID "{C37E0C3B-9277-4A6F-83BA-613B3A5E6FBA}"
[ALSOFT] (II) Got device "OpenAL Soft on Speaker (Realtek(R) Audio)"
[ALSOFT] (II) Got device "OpenAL Soft on ╨Ф╨╕╨╜╨░╨╝╨╕╨║╨╕ (Steam Streaming Microphone)"
[ALSOFT] (II) Got device "OpenAL Soft on ASUS VA27EHE (NVIDIA High Definition Audio)"
[ALSOFT] (II) Got device "OpenAL Soft on ╨Ф╨╕╨╜╨░╨╝╨╕╨║╨╕ (Steam Streaming Speakers)"
[ALSOFT] (II) Key frequency not found
[ALSOFT] (II) Key sources not found
[ALSOFT] (II) Key slots not found
[ALSOFT] (II) Key sends not found
[ALSOFT] (II) Created device 000001F67D45ABD0, "OpenAL Soft on Headphone (Realtek(R) Audio)"
[ALSOFT] (II) Key sample-type not found
[ALSOFT] (II) Key channels not found
[ALSOFT] (II) Key ambi-format not found
[ALSOFT] (II) Key period_size not found
[ALSOFT] (II) Key periods not found
[ALSOFT] (II) Key hrtf not found
[ALSOFT] (II) Pre-reset: Stereo, Float32, 44100hz, 882 / 2646 buffer
[ALSOFT] (II) Post-reset: Stereo, Int16, 44100hz, 882 / 2646 buffer
[ALSOFT] (II) Key stereo-mode not found
[ALSOFT] (II) Key stereo-encoding not found
[ALSOFT] (II) Key cf_level not found
[ALSOFT] (II) Stereo rendering
[ALSOFT] (II) Channel config, Main: 3, Real: 2
[ALSOFT] (II) Allocating 5 channels, 20480 bytes
[ALSOFT] (II) Enabling single-band first-order ambisonic decoder
[ALSOFT] (II) Max sources: 256 (255 + 1), effect slots: 64, sends: 4
[ALSOFT] (II) Key dither not found
[ALSOFT] (II) Key dither-depth not found
[ALSOFT] (II) Dithering enabled (16-bit, 32768)
[ALSOFT] (II) Key output-limiter not found
[ALSOFT] (II) Output limiter enabled, -0.0005dB limit
[ALSOFT] (II) Fixed device latency: 997732ns
[ALSOFT] (II) Post-start: Stereo, Int16, 44100hz, 882 / 2646 buffer
[ALSOFT] (II) Increasing allocated voices to 256
[ALSOFT] (II) Key volume-adjust not found
[ALSOFT] (II) Created context 000001F67D4ADF60
ALC_SOFT_HRTF: 1
[ALSOFT] (II) Key hrtf-paths not found
[ALSOFT] (II) Searching C:\code\alsoft-backends-test\cmake-build-debug\*.mhr
[ALSOFT] (II) Searching C:\Users\Sergey\AppData\Roaming\openal\hrtf\*.mhr
[ALSOFT] (II) Searching C:\ProgramData\openal\hrtf\*.mhr
[ALSOFT] (II) Adding built-in entry "!1_Built-In HRTF"
[ALSOFT] (II) Key default-hrtf not found
HRTFs count: 1
HRTF Name: Built-In HRTF
Use HRTF: 0
HRTF status: 0
[ALSOFT] (II) Increasing allocated voice properties to 32
Playing...
Finished.
[ALSOFT] (II) Freeing context 000001F67D4ADF60
[ALSOFT] (II) Freed 0 context property objects
[ALSOFT] (II) Freed 0 AuxiliaryEffectSlot property objects
[ALSOFT] (II) Freeing device 000001F67D45ABD0

Here's the repro:
https://github.com/sergeyext/alsoft-backends-test
MSVC only, builds with CMake in a straightforward way.
Switching debug/release builds of the application and the library does not affect the reproducibility.
data/hat.wav and data/kick.wav are reference sounds, data/loopback are recorded sounds whose images I posted here. The issue is headphones only! If I unplug headphones and switch to laptop built-in speakers, there is no such difference. Some clicks emerge, but it's another issue.

Did you try explicitly disabling HRTF in your alsoft.ini? I think it's enabled by default when the playback device type is detected as headphones. You could also force disable it via the app like in sView
Also, might wanna update to the latest OpenAL Soft build 👀

@ThreeDeeJay
Initially I used release 1.22.1.
Now I upgraded to the latest 1.23.1 release, and looks like it somewhat amplified the problem with default HRTF.

This is how I built the new library:

cmake -G"NMake Makefiles JOM" -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=usr -DFORCE_STATIC_VCRT=ON -DALSOFT_UTILS=OFF -DALSOFT_NO_CONFIG_UTIL=ON -DALSOFT_EXAMPLES=OFF -DALSOFT_UTILS=OFF -DLIBTYPE=STATIC ..

And these are the kick's waveforms (wasapi and dsound):
kick-1 23 1

Looks like the long tail's amplitude is lower than in previous version. The shape is reproducible on each test program launch.

Now let's vary the HRTF. First, I tried hrtf = off and got this in debug output:

[ALSOFT] (II) Found hrtf = "off"
[ALSOFT] (WW) general/hrtf is deprecated, please use stereo-encoding instead
[ALSOFT] (EE) Unexpected hrtf value: off

Ok, let's try all 3 values of stereo-encoding. Left to right: basic, uhj, hrtf:
kick-stereo-encoding
Basic looks okay and uhj has a small distortion. I need a high quality positional sound in my app and don't want to switch to basic.

I think for a "clean" raw stereo signal you'll need basic. It should still have left<->right positional audio, but to fool you brain into thinking some sounds are coming from above, below, rear, etc. on headphones, you'll need HRTF, which alters the frequency response by design (because that's how our ears localize sound directions, in addition to amplitude and delay differences)
Though on the bright side, OpenAL Soft HRTF uses diffuse-field compensation so the coloration should be minimal.

Arghhh.
Initially I didn't notice Use HRTF: 0 in my own program output when switching to dsound. If I force using HRTF with dsound in alsoft.ini, the output gets identical to wasapi with all other defaults.

It would be great to mention that dsound and winmm implicitly disable hrtf in ## drivers: (global) comments section of alsoftrc.sample.

I have some more questions regarding the issue.

  1. Is there a way to disable HRTF filtering on a per-source basis? I didn't find such an extension in the list at a first glance.

  2. Our sound designer complains that hrtf-filtered sounds get dimmer, flatter, less vibrant and less juicy.
    Are there any alternative HRTFs that don't disturb the sound so much? Or maybe the right way to go is to preprocess sounds for compensation?

  1. That's definitely a question for @kcat
  2. The default HRTF (MIT KEMAR) is actually one of the most neutral ones and preferred by many. That said, you can find more here https://kutt.it/FindOpenALSoftHRTF (Some also like SADIE II H2 for its neutral sound though H6 sounds more positional accurate for to my ears; it's different for everyone)
    Also, in the latest builds you can just drop the MHR into the executable's working folder, otherwise you might have to specify it in the INI or maybe replace the built-in HRTF.
kcat commented

It would be great to mention that dsound and winmm implicitly disable hrtf in ## drivers: (global) comments section of alsoftrc.sample.

They don't implicitly disable HRTF. The WinMM backend can't detect whether the device is headphones or not, so won't assume it's headphones instead of plain stereo speakers. HRTF will still be enabled on request (via the config file or context attributes), it won't be implicitly disabled, it just won't be automatically enabled.

The DSound backend technically should be able to detect headphones, and thus could enable HRTF automatically, but that depends on Windows. Specifically, it queries IDirectSound::GetSpeakerConfig, which can return DSSPEAKER_STEREO or DSSPEAKER_HEADPHONE for stereo output. If it gets DSSPEAKER_HEADPHONE, it will enable HRTF automatically (unless told differently via the config or attributes).

Is there a way to disable HRTF filtering on a per-source basis? I didn't find such an extension in the list at a first glance.

Using the AL_SOFT_direct_channels or AL_SOFT_direct_channels_remix extension. That disables 3D panning for selected sources, bypassing HRTF and other virtualization OpenAL may do, playing each source buffer channel directly on the matching output channel if it exists. Otherwise, the ALC_SOFT_HRTF extension allows disabling HRTF completely for all sources (and allows enabling it when it may otherwise be disabled by default, as well as selecting a specific HRTF "profile" when more than one are available).

Our sound designer complains that hrtf-filtered sounds get dimmer, flatter, less vibrant and less juicy.
Are there any alternative HRTFs that don't disturb the sound so much? Or maybe the right way to go is to preprocess sounds for compensation?

You shouldn't really preprocess the sounds to compensate, as there's no guarantee that HRTF will be used. A different HRTF can be used with different frequency characteristics, causing the compensation to fail. Whether or not the HRTF "disturbs" the sound depends on the person and expectation. If you expect a sound to sound exactly as it was authored, any HRTF will make it sound off since the purpose is to simulate the sound coming from an external point, where it's influenced by time (phase) and the ears and head (attenuating and boosting various frequencies, dependent on the size and shape of things), as if it was being played from a speaker somewhere around the listener. Though even if you are expecting that kind of "external quality" to the sound, the perception can be influenced by how well the HRTF aligns with the given listener; since HRTF models how the sound is influenced by the ears and head, the ideal response will be different for everyone. The closer the model is to the given listener, the better it will come across to the listener.

The MIT KEMAR model is generally thought of the standard, a decent generic starting point with a very permissive license. Though it is quite old at this point (made in the late 90s, IIRC), and there could well be better generic HRTFs out there. I'm always on the lookout for something better with an acceptable license, though given how personal HRTFs can be, "better" can be hard to quantify in a general scale.

Thank you very much!
I have no more questions on this topic.