A linguistic system without statistics is often fragile and may break when run on real data. It will also be unable to resolve ambiguities. The N-gram model is a starting point and may get reasonable results even though it doesn't have any real linguistics yet.
This assignment will guide you through the implementation of n-gram language models with various approaches to handling sparse data. You will also apply your model to the task of classification.
A1.pdf: Assignment Handbook
NLP_A1.ipynb: Jupyter Notebook template. (Google Colab version)
Data files for Section 1 n-gram lm
Data files for Section 2 preposition prediction