nlp-a2-N-grams

The data directory contains 2 csvs, one is the unfiltered dataset given originally, and one after all the pre-processing.
pre-processing.py contains the code for pre-processing the comments in a parallelized fashion.
Utils.py contains some utility functions that are used in the N-gram language model.
ngrams.py is the file that contains the implementation of the N-gram language model class and its methods.
models.py is the main experiment file where we instantiate the model for different n values and calculate the perplexity and log(perplexity).
plotting.py is used to plot the perplexity values for inference and analysis.
Smoothing_Comparison.txt stores the result of models.py which is a comparison between perplexities of different smoothing techniques on n-gram models.
The repo also contains the final documentation of the assignment in the pdf format.

RahulVC02/nlp-a2-N-grams