Romkabouter/ESP32-Rhasspy-Satellite

All Audio is just hissing and crackling.

whinis opened this issue · 42 comments

I have a Matrix Voice ESP32 and everything I attempt to play over mqtt just comes out of hisses and crackling or is extremely quiet.
I have tried with the Rhasspy directly and even the beeps don't come through just a barely audible hiss

One of my responses from Rhasspy is reading from a list and some words are audible but quite and others are just hissing

Then I tried playing audio using python streaming to mqtt however all I got is crackling.

Changing volume or using headphones seem to have no effect.

What is the sample rate of your audio files?

I tried 44.1k, 16k, and 8k

Hmm, everything below 44.1 should be ok. I cannot reproduce this sadly so I have no clue on the issue.

I am wondering if I somehow got a matrix with a bad amp ?

@whinis hey, see if there is correlation between length of audio and problems.

You can try this to dump the stream from mqtt to a file to do an offline check. Maybe it will help to pin-point the issue.
https://github.com/ayavilevich/rhasspy-helper/blob/main/logAudio.js

Hello,
I have the same issue on a Matrix Voice ESP32...
I made it working for a few tests with different voices in Rhasspy and now I am back to hissing and cracking.
I am looking into it too...
The wake word works.

OK,
I have no idea but I was playing around with the settings on the web interface (changing the audio output,mute input, mute output, volume, etc...), but now I got the incoming audio ( matrix voice -> rhasspy ) to work ok.
The Text to Speech via the "speak" button in rhasspy's web interface plays fine via the headphone jack.
The responses from home assistant (POST to :12101/api/text-to-speech) are fine.
BUT the sound that rhasspy makes when you say the wake word is hissing.......

There is something strange going with the volume ... at one time I made it to start low and progressively increase while speaking,
then I restarted the Matrix Voice and now it's back to normal, but rhasspy's wake sound is still hissing...

Mine also happened after changing the volume as it seemed low but I have been unable to get it recovered

Good pointers! I never change the volume so it might be an issue indeed.
I could not reproduce, but have some new leads now :)

Could you try and reflash the Maxtrix Voice?

  • attach to pi (same as where you initially flashed it with
  • reset by this: sudo voice_esp32_enable and esptool.py --chip esp32 --port /dev/ttyS0 --baud 115200 --before default_reset --after hard_reset erase_flash
  • reboot the Pi, all leds should now stay off
  • Reflash the software, don't touch the volume

You need to reflash, but the volume setting is written to a memeory address
I will also try to see if I can reproduce when adjusting the volume

May I also suggest to post here the log that comes from the serial of the esp32. Just remove any duplicate lines to keep it readable. This will show errors, if any, and details about the audio streams that are being played.

Hmm I didn't erase ... ok I will try and get back to you with more info.
@ayavilevich maybe pins 8 and 10 on the Raspberry Pi GPIO header side can get me the serial out log....
https://matrix-io.github.io/matrix-documentation/matrix-voice/resources/pinout/
Do you know an easier way (something in the matrix creator software maybe ? I look at it for the first time now...) or the usual usb to serial with voltage level converter to 3.3 as all esp chips use to be programmed ?

I am shooting in the blind as I didn't look at the source, but are you using the FPGA for something?
I see that the deploy.sh is flashing the ESP32 chip only...

@Dimitar-Boychev if you attach the matrix voice to the same Pi as you installed it, you can use minicom
sudo apt-get install minicom
sudo minicom

config the serial to /dev/ttyS0 and llogs will be printed :)

Initial logs with the firmware from yesterday:

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Buffer underflow
!!! TONS of Buffer underflow !!!
Enter HotwordDetected
Buffer underflow
Done
Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Done
Enter Idle

Reset via : sudo voice_esp32_enable and esptool.py --chip esp32 --port /dev/ttyS0 --baud 115200 --before default_reset --after hard_reset erase_flash
Reflash via ./deploy.sh from git bash
Rhasspy wakeup sounds hissing
TTS from Rhaspy web interface hissing too

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Buffer underflow
Done
Enter HotwordDetected
Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 <-- WAKEUP WORLD
Done
Enter Idle
Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16 <-- TTS FROM WEB INTERFACE
Done

Changing GAIN to 4 from web interface -> does not fix TTS
Changing volume to 75 from web interface -> ESP restart because of panic

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandl.
Core 0 register dump:
PC : 0x4017d4f3 PS : 0x00060230 A0 : 0x8017d6c3 A1 : 0x3
A2 : 0x00000000 A3 : 0x3fff58b8 A4 : 0x00000000 A5 : 0x3
A6 : 0x00000014 A7 : 0x00000000 A8 : 0x80087a60 A9 : 0x3
A10 : 0x3fff6634 A11 : 0x0000166f A12 : 0x68514260 A13 : 0x0
A14 : 0x00060a23 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x0
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xf

ELF file SHA256: 0000000000000000

Backtrace: 0x4017d4f3:0x3ffdd620 0x4017d6c0:0x3ffdd640 0x4017d702:0x3ffdd660 0x0

Rebooting...
I (10) boot: ESP-IDF v3.1 2nd stage bootloader
I (10) boot: compile time 11:08:24
I (11) boot: Enabling RNG early entropy source...
I (14) boot: SPI Speed : 40MHz
I (18) boot: SPI Mode : DIO
I (22) boot: SPI Flash Size : 4MB
I (26) boot: Partition Table:
I (29) boot: ## Label Usage Type ST Offset Length
I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000
I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000
I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000
I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000
I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000
I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000
I (82) boot: End of partition table
I (86) boot: No factory image, trying OTA 0
I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p
I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d
I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d
I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d
I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p
I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d
I (531) boot: Loaded app from partition at offset 0x10000
I (531) boot: Disabling RNG early entropy source...
Booting
Matrix Voice Initialized
Loading configuration
{
"mqtt_host": "XXX.XXX.XXX.XXX",
"mqtt_port": 1883,
"mqtt_user": "username",
"mqtt_pass": "password",
"mute_input": false,
"mute_output": false,
"amp_output": 1,
"brightness": 15,
"hotword_brightness": 15,
"hotword_detection": 1,
"volume": 75,
"gain": 4
}
Creating I2Stask
Enter WifiDisconnected
Total heap: 272168
Free heap: 188544
Enter WifiConnected

TTS still broken:

Enter Idle
Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16
Done

Output to headphone -> speakers

E (137116) task_wdt: Task watchdog got triggered. The following tasks did not r:
E (137116) task_wdt: - IDLE0 (CPU 0)
E (137116) task_wdt: Tasks currently running:
E (137116) task_wdt: CPU 0: I2Stask
E (137116) task_wdt: CPU 1: IDLE1
E (137116) task_wdt: Aborting.
abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting...
I (10) boot: ESP-IDF v3.1 2nd stage bootloader
I (10) boot: compile time 11:08:24
I (11) boot: Enabling RNG early entropy source...
I (14) boot: SPI Speed : 40MHz
I (18) boot: SPI Mode : DIO
I (22) boot: SPI Flash Size : 4MB
I (26) boot: Partition Table:
I (29) boot: ## Label Usage Type ST Offset Length
I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000
I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000
I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000
I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000
I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000
I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000
I (82) boot: End of partition table
I (86) boot: No factory image, trying OTA 0
I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p
I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d
I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d
I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d
I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p
I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d
I (531) boot: Loaded app from partition at offset 0x10000
I (531) boot: Disabling RNG early entropy source...
Booting
Matrix Voice Initialized
Loading configuration
{
"mqtt_host": "XXX.XXX.XXX.XXX",
"mqtt_port": 1883,
"mqtt_user": "username",
"mqtt_pass": "password",
"mute_input": false,
"mute_output": false,
"amp_output": 0,
"brightness": 15,
"hotword_brightness": 15,
"hotword_detection": 1,
"volume": 75,
"gain": 4
}
Creating I2Stask
Enter WifiDisconnected
Total heap: 272184
Free heap: 188688
Enter WifiConnected
Connected to Wifi with IP: YYY.YYY.YYY.YYY, SSID: WIFI_SSID, BSSID: AA:AA:AA:AA:AA
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
Enter MQTTConnected
Connected as satellite
Enter Idle

Output to speakers -> headphones (no restart this time), TTS hissing
Hotword brightness: 15 -> 40 (restart low below) TTS hissing

Parameter mqtt_host, value XXX.XXX.XXX.XXX
Parameter mqtt_port, value 1883
Parameter mqtt_user, value username
Parameter mqtt_pass, value password
Parameter amp_output, value 1
Parameter volume, value 75
Parameter brightness, value 15
Parameter hw_brightness, value 40
Hotword brightness changed
Parameter hotword_detection, value 1
Parameter gain, value 4
Settings changed, saving configuration
Saving configuration
{
"mqtt_host": "XXX.XXX.XXX.XXX",
"mqtt_port": 1883,
"mqtt_user": "username",
"mqtt_pass": "password",
"mute_input": false,
"mute_output": false,
"amp_output": 1,
"brightness": 15,
"hotword_brightness": 40,
"hotword_detection": 1,
"volume": 75,
"gain": 4
}
Enter MQTTDisconnected
Connect failed, retry
Audio connected: 0, Async connected: 1
Enter MQTTDisconnected
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
E (1659:
E (165971) task_wdt: - IDLE0 (CPU 0)
E (165971) task_wdt: Tasks currently running:
E (165971) task_wdt: CPU 0: I2Stask
E (165971) task_wdt: CPU 1: loopTask
E (165971) task_wdt: Aborting.
abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting...
I (10) boot: ESP-IDF v3.1 2nd stage bootloader
I (10) boot: compile time 11:08:24
I (11) boot: Enabling RNG early entropy source...
I (14) boot: SPI Speed : 40MHz
I (18) boot: SPI Mode : DIO
I (22) boot: SPI Flash Size : 4MB
I (26) boot: Partition Table:
I (29) boot: ## Label Usage Type ST Offset Length
I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000
I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000
I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000
I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000
I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000
I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000
I (82) boot: End of partition table
I (86) boot: No factory image, trying OTA 0
I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p
I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d
I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d
I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d
I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p
I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d
I (531) boot: Loaded app from partition at offset 0x10000
I (531) boot: Disabling RNG early entropy source...
Booting
Matrix Voice Initialized
Loading configuration
{
"mqtt_host": "XXX.XXX.XXX.XXX",
"mqtt_port": 1883,
"mqtt_user": "username",
"mqtt_pass": "password",
"mute_input": false,
"mute_output": false,
"amp_output": 1,
"brightness": 15,
"hotword_brightness": 40,
"hotword_detection": 1,
"volume": 75,
"gain": 4
}
Creating I2Stask
Enter WifiDisconnected
Total heap: 272184
Free heap: 188688
Enter WifiConnected
Connected to Wifi with IP: YYY.YYY.YYY.YYY, SSID: WIFI_SSID, BSSID: AA:AA:AA:AA:A
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
Enter MQTTConnected
Connected as satellite
Enter Idle

Manually woke up Rhasspy via button in web interface -> Play recording is hissing -> download as WAV and playing on the PC is perfectly fine.

Volume moved to 100:

Parameter mqtt_host, value XXX.XXX.XXX.XXX
Parameter mqtt_port, value 1883
Parameter mqtt_user, value username
Parameter mqtt_pass, value password
Parameter amp_output, value 1
Parameter volume, value 100
Volume changed
Parameter brightness, value 15
Parameter hw_brightness, value 40
Parameter hotword_detection, value 1
Parameter gain, value 4
Settings changed, saving configuration
Saving configuration
{
"mqtt_host": "XXX.XXX.XXX.XXX",
"mqtt_port": 1883,
"mqtt_user": "username",
"mqtt_pass": "password",
"mute_input": false,
"mute_output": false,
"amp_output": 1,
"brightness": 15,
"hotword_brightness": 40,
"hotword_detection": 1,
"volume": 100,
"gain": 4
}
Enter MQTTDisconnected
Connect failed, retry
Audio connected: 0, Async connected: 1
Enter MQTTDisconnected
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
Connecting MQTT: XXX.XXX.XXX.XXX, 1883
Enter Md
Connected as satellite
[E][AsyncTCP.cpp:885] _lwip_fin(): 0x3fff4834 != 0x3fff49d8
Enter Idle
[E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8
[E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8
[E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8
[E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8

At this point [E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8 was repeating once every 0.5 seconds

Power cycle matrix voice ( removing it from the header and getting it back in)
TTS -> no problems works flawlessly now

Connected as satellite
Enter Idle
Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16
Done
Using the wake word same thing with the hissing
Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Buffer underflow
Enter HotwordDetected
Buffer underflow
Done
Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Done
Enter Idle
Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16
Done

At this point I cleared again the flash and wrote the same firmware to test...
If I don't touch anything TTS worked ok, Rhasspy wakeup sounds are broken.
gain to 5 -> TTS OK
Volume change-> crash, but after the reboot TTS is still OK

[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][WiFiClient.cpp:463] available(): fail on fd -1, errno: 11, "No more process"
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968
E (154106) task_wdt: Task watchdog got triggered. The following tasks did not r:
E (154106) task_wdt: - IDLE0 (CPU 0)
E (154106) task_wdt: Tasks currently running:
E (154106) task_wdt: CPU 0: I2Stask
E (154106) task_wdt: CPU 1: IDLE1
E (154106) task_wdt: Aborting.
abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting...
I (10) boot: ESP-IDF v3.1 2nd stage bootloader
I (10) boot: compile time 11:08:24
I (11) boot: Enabling RNG early entropy source...
I (14) boot: SPI Speed : 40MHz
I (18) boot: SPI Mode : DIO
I (22) boot: SPI Flash Size : 4MB
I (26) boot: Partition Table:
I (29) boot: ## Label Usage Type ST Offset Length
I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000
I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000
I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000
I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000
I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000
I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000
I (82) boot: End of partition table
I (86) boot: No factory image, trying OTA 0
I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p
I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d
I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d
I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d
I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p
I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d
I (531) boot: Loaded app from partition at offset 0x10000
I (531) boot: Disabling RNG early entropy source...
Booting
Matrix Voice Initialized
Loading configuration

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16

I see the samplerate of 44100, this is know to cause hissing.

As far as I can tell now, it might be related to a combination of volume and gain.
This is because with a fresh flash and no adjustements, TTS works OK.
The hissing from the wake wavs is most probably due to the 44100. Can you change those to 22050 and try again?

My focus wil be to change settings for volume and gain (I have never actually used gain and do not know if it even works).
Then see if I can reproduce, the hissing and also the crashed

Yes, as I saw the 44100 in the log I knew what was needed :)
I resampled the three wav files down to 22050 and changed them in the Rhasspi's web interface under Settings -> Sound and now it is all good :)
I don't know how after the first erase and reflash I broke the TTS sounds, and how a restart helped and why the second time there were no problems with the TTS...
Maybe some of the other actions broke it the first time ?
Anyway thanks a lot :)

I am waiting on a new rPi to test, The one I used to setup the matrix is currently heavily integrated into a 3d printer.

@whinis did you have the change to retest?

I have been unable to get the serial output on my pi zero w to work, I have not been able to figure out why.
In the mean time I purchased 3 of the echos and playing long audio files causes them to crackle loudly while the song can be heard in the background of the crackling. Changing volume also seems to have no effect similar to the matrix.

Is there a possiblity to attach the audio files?

Not due to copyright, its from my audio library. I am looking for some open license songs to replicate the issue with so that I may

I found this royalty free music on pixabay by Michael Kobrin. Used Audacity to take the mp3 and turn it into a wav and resampled to 16000. To my ears on my desktop it sounds no different but im also not an audiophile.

I then use this python to send the result to mqtt

import paho.mqtt.publish as mqtt
data = open("<musicFolder>\\nightlife-michael-kobrin-95bpm-3783.wav",'rb').read()
mqtt.single("hermes/audioServer/echo_voice_livingroom/playBytes/myID", payload=data, qos=0, retain=False, hostname="homeassistant.local",
           port=1883, client_id="", keepalive=60, will=None, auth= {'username':"test", 'password':"alsotest"}, tls=None
            , transport="tcp")

And the result is lots of popping and crackling with the occasional note of the song playing on my echo
https://cloud.whinis.com/index.php/s/sNiYCSAtzKYTP2r ( I tried attaching the file but just kept getting a is not included in the list error)

The problem seems less bad if you make the wav a mono file but still bad.

Ok, thnx. I will see if I can reproduce and maybe fix it :)

Not sure if its related or I am just playing audio wrong, but with the following code it seems all 3 echos I have and the matrix are just outputting clicking on the micrphone as I try and debug why my hotword is not working

from pydub import AudioSegment
from pydub.playback import play
import paho.mqtt.client as mqtt

topic = "hermes/audioServer/echo_voice_bedroom/audioFrame"

user = "test"
pw = "test"
host = "homeassistant.local"
port = 1883

def on_message(client, obj, msg: mqtt.MQTTMessage):
    # Advanced usage, if you have raw audio data:
    sound = AudioSegment(
        # raw audio data (bytes)
        data=msg.payload,

        # 2 byte (16 bit) samples
        sample_width=2,

        # 44.1 kHz frame rate
        frame_rate=16000,

        # stereo
        channels=1
    )
    play(sound)

mqttc = mqtt.Client()
mqttc.on_message = on_message
mqttc.username_pw_set(user, pw)
mqttc.connect(host, port)

mqttc.subscribe(topic, 0)

rc = 0

while rc == 0:
    rc = mqttc.loop()
print("rc: " + str(rc))```

The devices are sending a huge amount of small wave files, which you code tries to play. That is probably not working, you are also not parsing the wave header

Try this script:
https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/blob/voco/record.py
It save a couple of seconds to a file

But you topic is for the OUTPUT, so the recording of the mic, I do not know if that is what you want.

I found this royalty free music on pixabay by Michael Kobrin. Used Audacity to take the mp3 and turn it into a wav and resampled to 16000.
And the result is lots of popping and crackling with the occasional note of the song playing on my echo
https://cloud.whinis.com/index.php/s/sNiYCSAtzKYTP2r ( I tried attaching the file but just kept getting a is not included in the list error)

The sample you provided is 44100:
image

That is a known issue, but when I resample it to 11025 I also hear the issue. I have no solution yet

The devices are sending a huge amount of small wave files, which you code tries to play. That is probably not working, you are also not parsing the wave header

Try this script:
https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/blob/voco/record.py
It save a couple of seconds to a file

But you topic is for the OUTPUT, so the recording of the mic, I do not know if that is what you want.

Yes this worked much better. It seems none of the Echos have any real pickup. I can open that as another issue but being right next to it and regardless of gain setting I can barely hear myself in the recording. Meanwhile I am loud and clear in the matrix voice at the same distnace

It seems none of the Echos have any real pickup. I can open that as another issue but being right next to it and regardless of gain setting I can barely hear myself in the recording. Meanwhile I am loud and clear in the matrix voice at the same distnace

This issue is for audioOutput, so if you want you can open a separate issue. Although I just use the I2S code from the examples from M5 themselves and my hotword is triggering, I also found the volume very low on recording. Might be a issue with the device itself, which I cannot fix.

Good news and bad news.
The good news is, I found the issue
The bad news is, that I cannot think of a way to solve it.

What happens is, that when a large file is coming received the incoming data is faster than the output writes.
This is due to the samplerate.

So what I have is a delay to hold back on the ringbuffer push, but that has this very annoying side effect.
That delay is actually in the getting started as to what not to do:
http://marvinroger.viewdocs.io/async-mqtt-client/1.-Getting-started/

I need to find a solution for the fact that the async lib is processing incoming data faster than the audio writes the data to the speakers.
I have a very large ringbuffer (60000 bytes), but is fills faster than it is emptied.
The ringbuffer can also not be a lot bigger due to memory limitations.

Could you empty part of the ring buffer if its fills? The idea being 250 or 500 bytes may be a few ms of sound and so losing it shouldn't be very noticeable unless one looks for it. If the ring buffer fills you clear out 250-500 bytes ahead to give it room and continue on.

Ideally I would like to use this at some point for phone calls or music played through my homeassistant server.

Yes you can empty it, but than your audio will be missing. You will absolutely hear a view missing ms of sound.
And also, 500 bytes is not near enough sadly. The mqtt client receives a 1460 bytes or so per message.
When I resample the audiofile to 11025Hz and trim it to 4 seconds, the file is 177296 bytes.
With this, the onMQTTMessage callback is called around 121 times. I am now looking for a way to maybe buffer to file or something, but that really just moves the problem

Shouldn't the rhaspy server provide an isochronuous stream?
Is it transmitting erraticly or with fixed intervalls?

No, it just publishes the whole audio data in one payload over MQTT

is it different with udp streaming?

Yes, upd streaming sends raw packets with a certain blocklength

do you already have an idea how to implement it?
I think MQTT streaming is a dead end here...

I am bound to MQTT streaming because that is what Rhasspy is using.
Sadly, some functionality is not implemented in arduinoesp32, like vGetTaskByName. If that were so, could suspend the task involved in the ASyncTCP library. I have not found another suitable lib yet but I am now thinking on forking the code or implement a dedicated client.

You could use MQTT for transmitting the audio to rhasspy and UDP to receive from it. The Tx side does not need buffering in contrast to the Rx side. With a raw PCM UDP stream however, you would not need large buffers, since rhasspy already does that with its playout buffers also maintaining a constant packet interval...

the question is, wether the esp32 has enough processing power to receive udp on the same core as the i2s write task..
In addition to the MQTT stuff...

and UDP to receive from it.

That is currently not possible in Rhasspy. UPD is not a setting for audio play method.

I see now. I misread the docs...

I tried to increase the DMA buffer size up to 32 blocks with 512 bytes and also made the receiver thread call audioWrite() 32 times the data. No luck so far. The 44.1kHz sample with the I2S port also at 44.1kHz produces the same sound. I still have to try it with a 16kHz audio sample.

The problem is not the buffer.
The actual issue is that you cannot pause the aSync MQTT task.

This is what happens

  • message comes in async (split in about 1460 bytes per call to onMQTTMessage
  • buffer is filled upto 60000 bytes and the audio play is started
  • the audio is maybe 22050Hz, so the buffer is filled much faster than emptied by the I2S task (it cannot empty faster because it is playing those bytes)
  • when the buffer is full, I had a vTaskDelay, but that acutally causes packets to be lost, which are coming from the aSync client
  • problem: missing audio and crapout sound

I have made a pause function in the aSynch lib to prove my point and indeed, audio plays much better.
There are still some issues, but I was just proving the point for myself.

I need to find a way to synchronize the bytes coming in from the MQTT with the playing of the bytes, otherwise there will be packetloss, causing this issue

Please check out https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/releases/tag/v7.6
It should solve the audio playback issues

I will close this, but if it is still an issue, please reopen :)