Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use.
I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch
on hackaday.
The first kit of the project will be available at the end of November. It is on pre-order Now!
The simplified flowchart of a smart speaker is like:
+---+ +----------------+ +---+ +---+ +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+ +----------------+ +---+ +---+ +-+-+
|
|
+-------+ +---+ +----------------------+ |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+ +---+ +----------------------+
- Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
- Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
- Speech To Text (STT)
- Natural Language Understanding (NLU) converts raw text into structured data.
- Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
- Text To Speech
- Snips ⭐ - the first 100% on-device and private-by-design open-source Voice AI platform
- Mycroft ⭐ - a hackable open source voice assistant
- SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
- Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
- dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi
-
Amazon Alexa Voice Service - is the most widely used voice assistant
-
It has the smartest brain, its extension called Google Action can be created on a few steps with digitalflow.ai and its Device Action is very suit for home smart devices.
-
- Install Snips on Raspberry Pi 3, Linux, osX, iOS and Android
- Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
- Snowboy - DNN based hotword and wake word detection toolkit
- Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
- ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
- Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds
- Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
- Kaldi
- wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
- Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
- PocketSphinx - a lightweight speech recognition engine using HMM + GMM
-
Snips NLU - a Python library that allows to parse sentences written in natural language and extracts structured information.
- Mozilla TTS - Deep learning for Text to Speech
- Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
- manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
- espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
- ekho - Chinese text-to-speech engine
- WaveNet, Tacotron 2
-
Acoustic Echo Cancellation
- SpeexDSP, its python binding speexdsp-python
- EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
-
Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT
- tdoa
- odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
-
- BeamformIt - filter&sum beamforming
- CGMM Beamforming - a reference implementation
- MVDR Beamforming
- GSC Beamforming
-
Voice Activity Detection
- WebRTC VAD, py-webrtcvad
- DNN VAD
-
Noise Suppresion
- NS of WebRTC audio processing, python-webrtc-audio-processing
- PortAudio, pyaudio
- libsoundio
- ALSA
- PulseAudio
- Pipewire