Keyword spotting, Speech wake_up, pytorch, DNN, CNN, TDNN, DFSMN, LSTM
-
The project is based on ICASSP 2014 paper Small-footprint keyword spotting using deep neural networks.
-
We implement the idea with various deep neural network architecture, e.g.,DNN, CNN, TDNN, DFSMN, LSTM.
-
The project can be applied to several tasks, such as key-word spotting and speech wake-up.
-
command_loader.py: CommandLoader is defined for data extraction. The data is structured as follow
- path/key words/audio file (.wav)
-
model.py: Implementation of several backbones: DNN, CNN, TDNN, DFSMN, LSTM.
-
train.py: Definition of training & testing process.
-
run.py: Main program for training & testing. Possible parameters are explained below: +
Speech wake-up:
- MobvoiHotwords: A corpus of wake-up words collected from a commercial smart speaker of Mobvoi.
- Containing audio of "Hi xiaowen" and " Nihao Wenwen", as well as noise speech.
- Homepage
Key-word spotting:
- Synthetic Speech Commands Dataset.
- Consisted of key-words audio of thirty categories, e.g., "bed", "bird", "cat", "dog", "eight", "five", "stop", "wow", "zero".
- Download link
Key-word spotting:
- Batchnums-Accuracy curve with STFT:
- Batchnums-Accuracy curve with Deep KWS:
- Accuracy:
Module | Epoch1 | Epoch2 | epoch3 | epoch4 | epoch5 | text |
---|---|---|---|---|---|---|
DNN | 38.57% | 52.85% | 58.81% | 67.48% | 71.00% | 62.59% |
CNN | 95.30% | 96.12% | 96.30% | 97.20% | 96.75% | 95.17% |
TDNN | 70.10% | 69.02% | 74.35% | 77.87% | 80.76% | 76.50% |
LSTM | 57.36% | 74.35% | 75.16% | 79.31% | 81.39% | 78.75% |
DFSMN | 91.15% | 92.14% | 94.94% | 93.86% | 94.04% | 90.34% |
DNN(KWS) | 87.97% | 90.37% | 91.04% | 91.18% | 90.44% | 89.67% |
Speech wake-up:
- Accuracy with Deep_KWS:
- Loss with Deep_KWS: