/Tokenizer_and_Bigram

This is my simple and readable implementation of the Byte Pair Encoding Algorithm and a Bigram Model.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

This is a simple implementation of a Byte Pair Encoding Algorithm, used to tokenize text, and a Bigram Word Model.

These were created as part of my research work in Language Models and are my original implementation!

https://www.linkedin.com/in/peter-v-334609211/