/keyword_spotter

Speech recognition of keyword commands

Primary LanguageJupyter Notebook

Keyword spotter

A simple project on speech recognition.

Sebastian Thomas (datascience at sebastianthomas dot de)

In this project, we intend to recognize a keyword out of a list of ten given keywords.

It is an extension of the introductory tutorial on speech command recognition from Tensorflow.

It uses the speech_commands dataset of Pete Warden, version 0.0.2. The dataset contains 105829 WAV files, each of a duration of at most 1 second. Each file consists of a spoken command out of a list of 35 commands.

For demonstration purposes, a REST API was implemented. This was inspired by a tutorial of Velardo of his series Deep Learning (Audio) Application: From Design to Deployment.

Content

Data mining, analysis, training and evaluation of the classifier:

Main development:

REST API:

Future work

  • tune more hyperparameters
  • use class weights for training (we have imbalanced classes)
  • add background noise to the instances
  • use other form of data augmentation such as e.g time shifting
  • add a silence label
  • consider other classifier models

References

Warden, Pete: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv:1804.03209, 2018.

Velardo, Valerio: Deep Learning (Audio) Application: From Design to Deployment. YouTube, 2020.