This repository documents my journey into learning Audio ML, bridging the gap in my knowledge as I transition from a more general deep learning background. The exploration is not strictly linear and may branch into broader deep learning topics. It serves as a space for experimentation, implementation of research papers, writing blog posts, and creating tutorials on related subjects.
-
ASR:
-
Fast-Conformer: Implementing Fast-Conformer paper (https://arxiv.org/pdf/2305.05084) and probably other variants and optimizations.
-
[WIP] Study Slam
-
[WIP] Blog post about Audio LMs
-
[WIP] Blog post about distil-whisper + tutorial
-
-
TTS:
-
Torchaudio tutorials: https://pytorch.org/audio/main/index.html
-
WAVLab Lectures on Speech Recognition and Understanding: https://www.youtube.com/@wavlab3016/videos
-
Training recipe for Speech LMs: https://github.com/slp-rl/slamkit
-
Conversational AI Reading Group: https://poonehmousavi.github.io/rg