Having an issue with Pitchshift, Distortion, Compression, being used with Azure TTS

Question

Having an issue with Pitchshift, Distortion, Compression, being used with Azure TTS

Thaendril opened this issue 5 months ago · 0 comments

Hello!

So, I help do Audio for WingmanAI and we use Pedalboard for a number of overlayed adjusted effects to our AI speech, this works out quite well in normal cases but recently when attempting to apply Pitchshift, Compression or Distortion effects to our voices, Azure TTS has a bit of an issue with it.

By that I mean that azureTTS will send back data, however it plays at an incredibly low level to the point that I only knew it was playing anything due to my mixer showing it pop up.

This only happens with Azure, and it only happens with our output streaming setting turned on. If streaming is turned off and it instead just generates and reads a local .wav file that's created then the audio plays normally.

I can of course just increase the gain to something insane and compensate for the issue but that downside of that is if anyone does have output streaming off, it'll probably be the last thing they hear. :)

Using OpenAI, Elevenlabs, and various other providers have absolutely no issues.

So to sum this up.

Only with Azure TTS, the audio gain is lowered af when applying the [Pitchshift, Distortion, Compression] Pedalboard effects, for example used in our new Radio sfx

only if Azure Output Streaming is turned on
doesn't matter if Azure direct or Wingman Pro API, so not our backend

We tried:
changing the audio format Azure generates to rule out that it's hyper compressed but that's not it.
all the other providers and setups. No problem there.
manual adjustment of gain values.
testing multiple effects to see if any other were causing issues.

I'm okay sharing some of the code we have but I don't know what it is you need exactly and I don't want to just throw useless stuff in here. :)