/Ngram_Language_Model

Ngram modelling of some tweets

Primary LanguagePython

word-lever vs Char-level Ngram Language Models

N-Gram as the most intuitive computational language model can be implemented in the word-level or the char-leverl while having freedom-of-choice in terms of the N variable. In this project a fair comparision of uni-gram and bi-gram models in the word and charachter level.


Ngram gif

Results

Model Abstraction Level Data Partition F1-macro F1-micro acc prec recall
unigram word train 0.878 0.878 0.878 0.769 0.955
unigram word test 0.642 0.644 0.644 0.556 0.817
unigram char train 0.557 0.590 0.590 0.517 0.379
unigram char test 0.539 0.558 0.558 0.477 0.372
bigram word train 0.980 0.981 0.981 0.960 0.995
bigram word test 0.539 0.567 0.567 0.497 0.950
bigram char train 0.679 0.687 0.687 0.620 0.650
bigram char test 0.640 0.648 0.648 0.587 0.590

Resources