Automatic Speech Recognition (ASR) is an interdisciplinary task, requiring knowledge from Signal Processing, Natural Language Processing, and Machine Learning. This notebook is meant to explain each aspect for anyone not experienced in ASR, and to teach myself!
(07/08/2021)
The Notebook's code implementation of the learning model isn't complete, but its written explanations on DSP, NLP, and ML ASR Architecture still hold. For a working implementation of the Transformer architecture, go to Apoorv Nandan's fantastic example- which makes up the entirety of my Notebook's current learning architecture.
The model isn't able to provide an appropriate loss! If you'd like to help with that, leave an Issue :)
See the bottom of the Notebook for important readings!