Romkabouter/ESP32-Rhasspy-Satellite

Changing sample rate

ayavilevich opened this issue · 4 comments

Readme mentions a known issue with sample rate. Instructs to use sample rate of 22K or lower.
I am getting tons of "Buffer underflow" and I assume this issue affects me. However it was not clear where to change this sample rate.

Is this the sample rate that is set in this code? Is this the sample rate that is set by the main Rhasspy site? Is this the sample rate of the microphone?

Can you elaborate here or in the documentation what is needed to be changed?

The samplerate of the audio send TO the device.
I you use Rhasspy, you can set the samplerate in Google Wavenet to 44100 and so on.
Also, the feedback sounds can be set to any wave file, which could have a sample rate > 22K.

The incoming samplerate is printed on the serial when attached to a computer, what device are you using?

I am using an Atom echo.

I changed the "Beep" sounds to be 22K and there is an improvement. Only once I got "buffer underflow", it is possible it was at a time where more than one stream was being sent.

I am currently on NanoTTS for TTS. Is "Google Wavenet" the only TTS that supports a configurable sample rate? I prefer not to use cloud services.

Yes, that is the only one or I don't know how (or in Rhasspy to be more specific)

I understand the cloud services argument, but the spoken text is cached and hashed per voice/samplerate
So if you set it to a language an a samplerate and keep them on those settings, every text is only send once to the cloud service.
That cuts down a lot if you do not have to many random variables to be spoken and even then there will be a limited set.

Thx for the hints, I just want to add that playing wake/listen/error files with 48KHz doesn't seem to result in any sound output at all. I converted my previously used files from 48K to 22K and they do work fine now.

This is the command I used to convert them, using ffmpeg:

> ffmpeg -i input.wav -ar 22050 output.wav