Remote voice satellite using the Wyoming protocol.
- Works with Home Assistant
- Local wake word detection using Wyoming services
- Audio enhancements using webrtc
See the tutorial to build a satellite using a Raspberry Pi Zero 2 W and a ReSpeaker 2Mic HAT.
Requires:
- Python 3.7+ (tested on 3.9+)
- A microphone
script/setup
The examples below uses alsa-utils
to record and play audio:
sudo apt-get install alsa-utils
Run the satellite with remote wake word detection:
cd wyoming-satellite/
script/run \
--name 'my satellite' \
--uri 'tcp://0.0.0.0:10700' \
--mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw' \
--snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'
This will use the default microphone and playback devices.
Use arecord -D <DEVICE> ...
if you need to use a different microphone (list them with arecord -L
and prefer plughw:
devices).
Use aplay -D <DEVICE> ...
if you need to use a different playback device (list them with aplay -L
and prefer plughw:
devices).
Add --debug
to print additional logs.
In the Home Assistant settings "Devices & services" page, you should see the satellite discovered automatically. If not, click "Add Integration", choose "Wyoming Protocol", and enter the IP address of the satellite (port 10700).
Audio will be continuously streamed to the server, where wake word detection, etc. will occur.
Rather than always streaming audio to Home Assistant, the satellite can wait until speech is detected.
NOTE: This will not work on the 32-bit version of Raspberry Pi OS.
Install the dependencies for silero VAD:
.venv/bin/pip3 install 'pysilero-vad==1.0.0'
Run the satellite with VAD enabled:
script/run \
... \
--vad
Now, audio will only start streaming once speech has been detected.
Install a wake word detection service, such as wyoming-openwakeword and start it:
cd wyoming-openwakeword/
script/run \
--uri 'tcp://0.0.0.0:10400' \
--preload-model 'ok_nabu'
Add --debug
to print additional logs. See --help
for more information.
Included wake words are:
ok_nabu
hey_jarvis
alexa
hey_mycroft
hey_rhasspy
Community trained wake words are also available and can be included with --custom-model-dir <DIR>
where <DIR>
contains .tflite
file(s).
Next, start the satellite with some additional arguments:
cd wyoming-satellite/
script/run \
--name 'my satellite' \
--uri 'tcp://0.0.0.0:10700' \
--mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw' \
--snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw' \
--wake-uri 'tcp://127.0.0.1:10400' \
--wake-word-name 'ok_nabu'
Audio will only be streamed to the server after the wake word has been detected.
Note that --vad
is unnecessary when connecting to a local instance of openwakeword.
You can play a WAV file when the wake word is detected (locally or remotely), and when speech-to-text has completed:
--awake-wav <WAV>
- played when the wake word is detected--done-wav <WAV>
- played when the voice command is finished
If you want to play audio files other than WAV, use event commands. Specifically, the --detection-command
to replace --awake-wav
and --transcript-command
to replace --done-wav
.
Install the dependencies for webrtc:
.venv/bin/pip3 install 'webrtc-noise-gain==1.2.3'
Run the satellite with automatic gain control and noise suppression:
script/run \
... \
--mic-auto-gain 5 \
--mic-noise-suppression 2
Automatic gain control is between 0-31 dbFS, which 31 being the loudest. Noise suppression is from 0-4, with 4 being maximum suppression (may cause audio distortion).
You can also use --mic-volume-multiplier X
to multiply all audio samples by X
. For example, using 2 for X
will double the microphone volume (but may cause audio distortion). The corresponding --snd-volume-multiplier
does the same for audio playback.
Satellites can respond to events from the server by running commands:
--startup-command
- run when satellite starts (no stdin)--detect-command
- wake word detection has started, but not detected yet (no stdin)--streaming-start-command
- audio has started streaming to server (no stdin)--streaming-stop-command
- audio has stopped streaming to server (no stdin)--detection-command
- wake word is detected (wake word name on stdin)--transcript-command
- speech-to-text transcript is returned (text on stdin)--stt-start-command
- user started speaking (no stdin)--stt-stop-command
- user stopped speaking (no stdin)--synthesize-command
- text-to-speech text is returned (text on stdin)--tts-start-command
- text-to-speech response started (no stdin)--tts-stop-command
- text-to-speech response stopped (no stdin)--error-command
- an error was sent from the server (text on stdin)
For more advanced scenarios, use an event service (--event-uri
). See wyoming_satellite/example_event_client.py
for a basic client that just logs events.