WaveNet Keyword Spotting

This is a PyTorch implementation of keyword spotting based on WaveNet architecture.
To get started: jump straight to the code, run demo notebook in Google Colab or continue reading.

Installation

To install the repository with all requirements, run:

pip install git+https://github.com/AndBondStyle/wavenet-keyword-spotting

Directory wavenet_kws will be installed as a package (import wavenet_kws).
If you're planning to generate a custom dataset, you will also need to install ffmpeg and rubberband.

Generating dataset

Custom dataset consists of 3 directories:

positives - cropped samples of true keywords you want to detect
negatives - cropped samples of fake keywords (e.g. similar words).
random_speech - random samples of speech, not containing keyword

All files should have .wav extension and 16000 Hz sample rate.

Dataset can be generated using dataset.py script (look inside for details). Before running that, you will also need to download AudioSet (used as background noise source) via audioset_download.ipynb notebook.

Training

Training is done using training.py script (look inside for details).
An important note is that model checkpoint file also contains dataset configuration (see DatasetConfig class), and model config (keyword arguments used to initialize model, like WavenetKWS(**kwargs)). That way it's very convenient to load models: check model_from_checkpoint for details.

Running

There are several scripts to try your model (or pretrained one):

colab_demo - notebook specially designed to run in Google Collab
live_mic_detection - notebook with real-time prediction using microphone stream
detection.py - console version of above notebook, if you don't like jupyter