Run the tokenization.py file using the command
python3 2020201092_tokenization.py
To run the language model code, use the command:-
python3 language_model.py <smoothing type> <path to corpus>
example
python3 language_model.py k ./medical-corpus.txt
- Smoothing type can be k for Kneyser-Ney or w for Witten-Bell.
- an example:
$ python3 language_model.py k ./corpus.txt
input sentence: I am a man.
0.89972021