Course: ELEC-E5551
Speech Recognition
Team:
- Aditya Kaushik
- Eduardo Rosado
- Thomas Spilsbury
Dataset: English Gigaword Corpus
Exploratory data analysis, 4 sections:
- Data load, Preprocessing, Basic details
- Word clouds, common words
- Bigrams, Trigrams, Collocations
- Splitting data into model input, expected output
Baseline statistics: