/speech_recog

Speech Recognition for Tensorflow challenge

Primary LanguagePythonMIT LicenseMIT

The goal is to assign one of the following 12 labels to each command: yes, no, up, down, left, right, on, off, stop, go, silence, unknown.

I used mel-scaled spectrograms and mel-frequency cepstral coefficients as inputs for two NASNet-A Convolutional Neural Networks and then averaged their predictions.

My solution

I used PyTorch to train two NASNet-A Convolutional Neural Networks. The First network was trained on mel-scaled spectrograms, the second - on mel-frequency cepstral coefficients. Then I averaged their predictions to make a final submission.

Examples of mel-scaled spectrograms for speech commands: Mel-scaled spectrograms for speech commands

Requirements

To run the code

  • Adjust config variables in config.py
  • Execute run.sh file