/DL-Audio-Course

Deep Learning Audio Course, 2023

Primary LanguageJupyter Notebook

logo

Deep Learning for Audio Course, Fall 2023

Description

Topics discussed in course:

  • Digital Signal Processing
  • Automatic Speech Recognition (ASR)
  • Key-word spotting (KWS)
  • Text-to-Speech (TTS)
  • Voice Conversion
  • Unsupervised learning in Audio
  • Music Generation with NNs

Course materials

Materials

# Date Description Slides Video
1 September, 14 Lecture 1: Introduction and Digital Signal Processing slides video
2 September, 21 Lecture 2: Automatic Speech Recognition 1: WER, CTC, LAS, Beam Search slides video
3 September, 28 Seminar 1: Introduction, Spectrograms and Griffin-Lim notebook video
4 October, 5 Seminar 2: Levenstein distance, WER, CER notebook video
5 October, 12 Lecture 3: Automatic Speech Recognition 2: RNN-T, Conformer, Whisper, Language models in ASR, BPE slides video
6 October, 19 Seminar 3: CTC, Beam Search notebook video
7 October, 26 Lecture 4: Key-word spotting (KWS) slides video
8 November, 2 Lecture 5: Text-to-speech: Tacotron, FastSpeech, Guided Attention slides video
9 November, 9 Seminar 4: Key-word spotting notebook video
10 November, 16 Seminar 5: Text-to-speech: Tacotron2 notebook video
11 November, 23 Lecture 6: Text-to-speech: Neural Vocoders (WaveNet, PWGAN, DiffWave) slides video
12 November, 30 Lecture 7: Voice Conversion: AutoVC, CycleGAN-VC, StarGAN-VC slides video
13 December, 7 Lecture 8: Self-supervised learning in Audio slides video

Homeworks

Homework Date Deadline Description Link
1 October, 8 October, 22
  1. Audio classification
  2. Audio preprocessing
Open In Github
2 November, 3 November, 18 ASR-1: CTC Open In Github
3 November, 3 December, 3 ASR-2: RNN-T Open In Github
[Additional] Text-to-speech: FastPitch Open In Github

Game rules

  • 4 homeworks each of 2 points = 8 points
  • final test = 2 points
  • maximum points: 8 + 2 = 10 points

Authors

Pavel Severilov

Daniel Knyazev