/make-a-smart-speaker

A collection of resources to make a smart speaker

To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use. I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch on hackaday.

The first kit of the project will be available at the end of November. It is on pre-order Now!

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+
  • Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
  • Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
  • Speech To Text (STT)
  • Natural Language Understanding (NLU) converts raw text into structured data.
  • Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
  • Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

  • Snips ⭐ - the first 100% on-device and private-by-design open-source Voice AI platform
  • Mycroft ⭐ - a hackable open source voice assistant
  • SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
  • Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
  • dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

KWS

  • Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
  • Snowboy - DNN based hotword and wake word detection toolkit
  • Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
  • ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
  • Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds

STT

  • Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
  • Kaldi
  • wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
  • Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
  • PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

TTS

  • Mozilla TTS - Deep learning for Text to Speech
  • Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
  • manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
  • espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
  • ekho - Chinese text-to-speech engine
  • WaveNet, Tacotron 2

Audio Processing

  • Acoustic Echo Cancellation

    • SpeexDSP, its python binding speexdsp-python
    • EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
  • Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT

    • tdoa
    • odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
  • Beamforming

  • Voice Activity Detection

  • Noise Suppresion

Audio I/O