My Journey into Learning Audio ML

About

This repository documents my journey into learning Audio ML, bridging the gap in my knowledge as I transition from a more general deep learning background. The exploration is not strictly linear and may branch into broader deep learning topics. It serves as a space for experimentation, implementation of research papers, writing blog posts, and creating tutorials on related subjects.

What am I currently working on

ASR:
- Fast-Conformer: Implementing Fast-Conformer paper (https://arxiv.org/pdf/2305.05084) and probably other variants and optimizations.
- [WIP] Study Slam
- [WIP] Blog post about Audio LMs
- [WIP] Blog post about distil-whisper + tutorial
TTS:
- Finetuning Llasa: Blog Post Repo

Useful Ressources

Torchaudio tutorials: https://pytorch.org/audio/main/index.html
WAVLab Lectures on Speech Recognition and Understanding: https://www.youtube.com/@wavlab3016/videos
Training recipe for Speech LMs: https://github.com/slp-rl/slamkit
Conversational AI Reading Group: https://poonehmousavi.github.io/rg

Deep-unlearning/Learning-Audio-ML

My Journey into Learning Audio ML

About

What am I currently working on

Useful Ressources