This is a PyTorch implementation of keyword spotting based on
WaveNet architecture.
To get started: jump straight
to the code,
run demo
notebook in Google Colab
or continue reading.
To install the repository with all requirements, run:
pip install git+https://github.com/AndBondStyle/wavenet-keyword-spotting
Directory wavenet_kws
will be installed as a package (import wavenet_kws
).
If you're planning to generate a custom dataset, you will also need to install
ffmpeg and
rubberband.
Custom dataset consists of 3 directories:
positives
- cropped samples of true keywords you want to detectnegatives
- cropped samples of fake keywords (e.g. similar words).random_speech
- random samples of speech, not containing keyword
All files should have .wav
extension and 16000 Hz
sample rate.
Dataset can be generated using dataset.py script (look inside for details). Before running that, you will also need to download AudioSet (used as background noise source) via audioset_download.ipynb notebook.
Training is done using training.py script (look inside for details).
An important note is that model checkpoint file also contains dataset configuration
(see DatasetConfig class), and model config (keyword arguments
used to initialize model, like WavenetKWS(**kwargs)
). That way it's very convenient
to load models: check model_from_checkpoint
for details.
There are several scripts to try your model (or pretrained one):
- colab_demo - notebook specially designed to run in Google Collab
- live_mic_detection - notebook with real-time prediction using microphone stream
- detection.py - console version of above notebook, if you don't like jupyter