severilov/DL-Audio-Course

Deep Learning Audio Course, 2023

Jupyter Notebook

Deep Learning for Audio Course, Fall 2023

Description

Topics discussed in course:

Digital Signal Processing
Automatic Speech Recognition (ASR)
Key-word spotting (KWS)
Text-to-Speech (TTS)
Voice Conversion
Unsupervised learning in Audio
Music Generation with NNs

Course materials

Materials

#	Date	Description	Slides	Video
1	September, 14	Lecture 1: Introduction and Digital Signal Processing	slides	video
2	September, 21	Lecture 2: Automatic Speech Recognition 1: WER, CTC, LAS, Beam Search	slides	video
3	September, 28	Seminar 1: Introduction, Spectrograms and Griffin-Lim	notebook	video
4	October, 5	Seminar 2: Levenstein distance, WER, CER	notebook	video
5	October, 12	Lecture 3: Automatic Speech Recognition 2: RNN-T, Conformer, Whisper, Language models in ASR, BPE	slides	video
6	October, 19	Seminar 3: CTC, Beam Search	notebook	video
7	October, 26	Lecture 4: Key-word spotting (KWS)	slides	video
8	November, 2	Lecture 5: Text-to-speech: Tacotron, FastSpeech, Guided Attention	slides	video
9	November, 9	Seminar 4: Key-word spotting	notebook	video
10	November, 16	Seminar 5: Text-to-speech: Tacotron2	notebook	video
11	November, 23	Lecture 6: Text-to-speech: Neural Vocoders (WaveNet, PWGAN, DiffWave)	slides	video
12	November, 30	Lecture 7: Voice Conversion: AutoVC, CycleGAN-VC, StarGAN-VC	slides	video
13	December, 7	Lecture 8: Self-supervised learning in Audio	slides	video

Homeworks

Homework	Date	Deadline	Description	Link
1	October, 8	October, 22	Audio classification Audio preprocessing
2	November, 3	November, 18	ASR-1: CTC
3	November, 3	December, 3	ASR-2: RNN-T
[Additional]			Text-to-speech: FastPitch

Game rules

4 homeworks each of 2 points = 8 points
final test = 2 points
maximum points: 8 + 2 = 10 points

Authors

Pavel Severilov

telegram: @severilov
e-mail: pseverilov@gmail.com

Daniel Knyazev

telegram: @Oorgien
e-mail: xmaximuskn@gmail.com