https://github.com/besacier/ASR2022/blob/main/ASRcourseMOSIG2022.pdf
This is an introduction (97 slides) to modern ASR systems and speech SSL Content for 6h-8h of lecture
Remarks:
(a) The 'current challenges' part at the end needs to be refreshed
(b) SSL part does not (yet) include the following models: -MAESTRO paper (from Interspeech 2022) https://arxiv.org/abs/2204.03409?context=eess -Speech2C model (by same authors as SpeechT5) https://www.isca-speech.org/archive/pdfs/interspeech_2022/ao22_interspeech.pdf -Data2Vec2.0 paper https://ai.facebook.com/research/publications/efficient-self-supervised-learning-with-contextualized-target-representations-for-vision-speech-and-language
(c) Whipser model not included in the current slides
please send your feedback to: laurent.besacier@naverlabs.com