To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. ~~Hope we can make an open source one for daily use.~~ I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch on hackaday.

The first kit of the project will be available at the end of November. It is on pre-order Now!

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+

Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
Speech To Text (STT)
Natural Language Understanding (NLU) converts raw text into structured data.
Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

Snips ⭐ - the first 100% on-device and private-by-design open-source Voice AI platform
Mycroft ⭐ - a hackable open source voice assistant
SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

Amazon Alexa Voice Service - is the most widely used voice assistant
Google Assistant SDK

It has the smartest brain, its extension called Google Action can be created on a few steps with digitalflow.ai and its Device Action is very suit for home smart devices.
Baidu DuerOS
Snips
- Install Snips on Raspberry Pi 3, Linux, osX, iOS and Android
SEPIA Installation, SEPIA with Porcupine + ReSpeaker

KWS

Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
Snowboy - DNN based hotword and wake word detection toolkit
Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds

STT

Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Kaldi
wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

Rasa NLU
- Rasa NLU for Chinese
Snips NLU - a Python library that allows to parse sentences written in natural language and extracts structured information.

TTS

Mozilla TTS - Deep learning for Text to Speech
Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
ekho - Chinese text-to-speech engine
WaveNet, Tacotron 2

Audio Processing

Acoustic Echo Cancellation
- SpeexDSP, its python binding speexdsp-python
- EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT
- tdoa
- odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
Beamforming
- BeamformIt - filter&sum beamforming
- CGMM Beamforming - a reference implementation
- MVDR Beamforming
- GSC Beamforming
Voice Activity Detection
- WebRTC VAD, py-webrtcvad
- DNN VAD
Noise Suppresion
- NS of WebRTC audio processing, python-webrtc-audio-processing

Audio I/O

PortAudio, pyaudio
libsoundio
ALSA
PulseAudio
Pipewire

sevencheng798/make-a-smart-speaker

To make a smart speaker

KWS + STT + NLU + Skill + TTS

Active open source projects

SDK

KWS

STT

NLU

TTS

Audio Processing

Audio I/O