/audio-recognition

Deep learning tutorials about audio data

Important papers and implementation

A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. ICASSP, Vancouver, 2013.

J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” in Proc. NIPS, 2015.

Deepcpeech2 (ICLR 2016)

notes:

  • "SortaGrad”: order utterances by length during first epoch.
  • "Batchnorm"
  • Using CTC loss