Neural network LMs for ASR

Course: ELEC-E5551 Speech Recognition

Team:

  • Aditya Kaushik
  • Eduardo Rosado
  • Thomas Spilsbury

Literature review

Project plan

Dataset: English Gigaword Corpus


Exploratory data analysis, 4 sections:

  1. Data load, Preprocessing, Basic details
  2. Word clouds, common words
  3. Bigrams, Trigrams, Collocations
  4. Splitting data into model input, expected output

sample_data


Baseline statistics: