icculus/mojoAL

Pitch shifting with libsamplerate?

Benau opened this issue · 7 comments

Benau commented

Hello icculus!
Not sure if you miss my supertuxkart team mail, but we've tried using your project as replacement for openal in stk-code, and we added a pitch-shfter ourselves.

It seems that actually "pitch shift without using resampling" is too slow for production usage (audio lags if too many sources requiring pitch shifting)..., so I came up with something like this with libsamplerate:
supertuxkart/stk-code@33459e8
(the original pitch shifter you posted on twitter is too slow)

What do you think about it? Or maybe you can add something to SDL_AudioStream API so that it can "live change" the dst / src ratio like libsamplerate, so we don't need to link against samplerate.


another quick question, possible to have smoother transition between 2 speakers?

Basically when some sound source is moving between center to left and right, the 2 speakers sound "digitally" instead of "analog" across 2 speakers (sound instantly changes between 2 speakers without fading in or out), while the official OpenAL has smoother experience

I haven't looked at your pitch shifter, but I think there was a concern that AL_PITCH had to operate in the frequency domain and not the time domain; that is, a sound can't be resampled to change pitch because it would finish more quickly, even if this generally doesn't matter for looping sounds...but I'd have to recheck the AL spec to see if that's actually true. The pitch-shifting code I wrote stinks for like 12 different reasons, so a better solution is definitely needed no matter what.

As for the transition: yes, this code also has problems. It's already an extremely naive approach, and on top of that, there's clearly some incorrect math somewhere causing that jarring transition. Fixing that is on the TODO list (but I'm going to split this comment into a second bug so there's an actual place we're tracking that change).

Benau commented

For the record libopenal pitch shift with resampling too (larger AL_PITCH will make the sound play faster with higher pitch)

Not sure if you prefer another issue, but just quick question, lets say a device has a 5.1 channel speaker, and if it plays audio with mojoal, will it utilize only 2 speaker out of 5 (6?) or all speakers will have sound output too, just half of it will have the same audio data?

Hey, I took a read at openAL Soft implementation and they reference this blog post in their source code: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/ , at step "7. The Code" there's a good explanation on how to do such implementation that is very interesting. (also interesting fft library https://github.com/mborgerding/kissfft)

Hey, I took a read at openAL Soft implementation and they reference this blog post in their source code:

So I implemented this (using the code from that same blog) for DragonRuby Game Toolkit, and never actually committed this to the main repo because I was waiting for it to get more testing. I'll commit this shortly.

Note that each source that changes AL_PITCH with this algorithm will allocate 32 kilobytes worth of buffers. If you don't change the pitch, it won't do this, though. I guess in modern times that isn't much, but it felt exorbitant to me!

I've pushed the code I mentioned in #2 (comment), but I'm not super-happy with all the sin/cos/sqrt/etc calls it makes. It might be unavoidable with this approach, but it seems potentially quite expensive. But it's better than having no pitch support for now. I might end up adding a resampling implementation as a compile-time option, just to have the option for those that don't care about playback duration changing. Leaving this bug open for now.

I may be mistaken, but reading OpenAL's specification does not directly mention how the pitch suppose to work in description for the AL_PITCH, however, there's this paragraph in the description to AL_SEC_OFFSET on page 39:

This value is based on byte position, so a pitch-shifted source will have an exaggerated
playback speed. For example, you can be 0.500 seconds into a buffer having taken only
0.250 seconds to get there if the pitch is set to 2.0.

Does not that mean that the pitch implementation should modify duration, changing playback speed? Or does openal (or mojoal in particular) require additional changes from user side to have such effect?

Hmm, let me look into this. The AL_PITCH section doesn't specify anything, and I think the SEC_OFFSET part was originally a Loki extension.

But honestly, it's clearly vague in the spec, at best, and I'd rather do this with sample rate conversion. I'll look into it.