Romkabouter/ESP32-Rhasspy-Satellite

Hotword detection

sehraf opened this issue · 14 comments

As discussed in #23 (comment), i imagine the following scenarios:

  • external detection (e.g. via Rhasspy)
    • In the idle state the device streams the audio data.
    • When Rhasspy detect the hotword, the device switches state to change the LEDs but continuous streaming.
    • When Rhasspy signals the end of recording, the device goes back to the idle state.
  • internal detection
    • In the idle state, the device listens for the hotword but does not (necessarily) stream the audio data.
    • When the device detects the hotword, it changes state, switches the LEDs, signals Rhasspy that a hotword was detected/should listen and starts streaming the audio data.
    • When Rhasspy signals the end of recording, the device goes back to the idle state.
  • detection based on hardware feature (e. g. button) [A]
    • In the idle state, the device does nothing.
    • When the hardware feature is triggered, the deceive changes state, switches the LEDs, signals Rhasspy that a hotword was detected/should listen and starts streaming the audio data.
    • When Rhasspy signals the end of recording, the device goes back to the idle state.
  • detection based on hardware feature (e. g. button) [B]
    • In the idle state, the device does nothing.
    • As long as the hardware feature is triggered, the deceive changes state, switches the LEDs, signals Rhasspy that a hotword was detected/should listen and starts streaming the audio data.
    • When the hardware feature is not triggered anymore, the device goes back to the idle state.

Any other ideas?

external detection is already done, internal detecion code must still be ported to the rewrite branch but is already in master.
However only "Alexa" is working now, but I saw some other methods with tensorflow.

The detection in situation A is also already implemented, I will test this some more since you mentioned it was not working for you.
I don't think situation B is a good usecase for button B, Rhasppy will respond on the first hotword detected signal.
So repeating that is not a good practise.
Also, the button(s) should not trigger hotword detection, but start a session on Rhasspy. Starting a session on Rhasspy will trigger a hotword/toggle message, which in turn triggers the hotwordstate in the esp satellite.

Maybe you can elaborate a bit more on that usecase?

external detection is already done, internal detecion code must still be ported to the rewrite branch but is already in master.

Sure, i just wanted to keep everything in one place.

However only "Alexa" is working now, but I saw some other methods with tensorflow.

I would appreciate anything else, than Alexa 😄

I don't think situation B is a good usecase for button B, Rhasppy will respond on the first hotword detected signal.
So repeating that is not a good practise.

My idea was to give the user full control over the audio recording. I won't use it for myself but thoughtit might be usefull to someone.

Also, the button(s) should not trigger hotword detection, but start a session on Rhasspy

That's what i meant to say.

Starting a session on Rhasspy will trigger a hotword/toggle message, which in turn triggers the hotwordstate in the esp satellite.

OK, haven't seen that yet.

Maybe you can elaborate a bit more on that usecase?

Sure, the first and second one are basically like any other assistant, you just talk and things happen. Only the "location" of the hotword detection changes. The second one also keeps the wifi clear. As long as the hardware is capable i don't see any reason in streaming audio 24/7.
The third and fouth are for users who either have no hotword detecting capable hardware or just want to make sure that audio is only recorded when necessarry. As pointed out above, the fourth use case gives full control over the lenght of the recording.

Yeah, well Alexa is the only english hotword supported with Wakenet at the moment ;)
OK, for user control over recording there is a muteInput function :)

For your hotword trigger button, can you check if a message is send to your broker on hermes/dialogueManager/startSession? If it does not, the isHotwordDetected() on your device is not returning true for some reason.

For your hotword trigger button, can you check if a message is send to your broker on hermes/dialogueManager/startSession?

i do get this message, but it seems to have no effect
Bildschirmfoto von 2021-01-03 18-48-25

This is my Rhasspy configuration
Bildschirmfoto von 2021-01-03 18-49-32

Wakeword should be set on the server, and on all settings your satellite siteid should be in satellites.
In your case, default. When you get the message, the button is working correctly.
I will experiment with the wakeword setting, it moght work without but the HotwordDetected state is triggered by a message in hermes/hotword/toggleOff containing your siteId and dialogueSession as reason.
That message might not be send when wakeword is disabled.
In any case your satellite siteid should be filled in the Satellite siteIds.

Maybe filling default in Satellite siteIds under Dialogue Management is enough, I have not tried without wakeword disabled.

In any case your satellite siteid should be filled in the Satellite siteIds.

it's set to default on all ends i'm aware off.

What is the siteId of your server?

Bildschirmfoto von 2021-01-03 19-28-18
Is it this field? Or is there another?

Ok, I see you have set default indeed. Then there should be no need to fill the Satellite Id's
Be aware that for multiple satellite's that should be changed.

Does your Rhasspy respond at all? If so, all is working correct and the cause is the disabled setting on wakeword most probably. Please check the results if set (no matter what, because you are triggering via a button anyway)
If

O wait, Dialogue Management must be set to Rhasspy as well (important)

O wait, Dialogue Management must be set to Rhasspy as well (important)

THIS! Thanks a lot. That was the missing piece.

I'm using this configuration now:

  • Audio Recording ➡️ Hermes MQTT
  • Wake Word ➡️ Hermes MQTT
  • Audio Playing ➡️ Hermes MQTT
  • Dialog Manager ➡️ Rhasspy

So scenario 3 works (with constant audio streaming)
Also one a site note: i get a lof of disturbance and maaaany Buffer underflow when Rhasspy plays it's sounds. But this is for another issue.

What is the samplerate of the audio files? Less than 44100 should not be a problem on a good wifi connection.
The incoming audio is buffered, but the asynch mqtt must receive and process the bytes fast enough or there will be buffer underflows.

If you want you can create a new issue with all details, I will close this one then since it's resolved :)

For the record, you do need wakeword actvated when using hardware button (I've checked it(

What is the samplerate of the audio files?

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16

I will close this one then since it's resolved :)

WakeNet working already? 😛 :

For the record, you do need wakeword actvated when using hardware button (I've checked it(

What do you mean? The Rhasspy option? I've set that to Hermes (since the wakeword notification comes through Hermes)

Wakenet is working on the master branch ;)
Indeed the Rhasspy setting for wakeword can be disabled, I have just confirmed that with the M5.

The samplerate is the issue indeed, try to lower it to a max of 22050. This is also on issue with the master branch, but maybe the mqtt client is just to slow or something. I have spent ages on it trying to fix, but if the bytes do not come fast enough in the buffer, this will always be an issue :\

I have just found that the currenct WakeNet does not work with the latest framework. I will drop support (since it is only Alexa anyway) and hope porcupine will add a library soon