This is a simple implementation of a Byte Pair Encoding Algorithm, used to tokenize text, and a Bigram Word Model.
These were created as part of my research work in Language Models and are my original implementation!
This is my simple and readable implementation of the Byte Pair Encoding Algorithm and a Bigram Model.
PythonGPL-3.0
This is a simple implementation of a Byte Pair Encoding Algorithm, used to tokenize text, and a Bigram Word Model.
These were created as part of my research work in Language Models and are my original implementation!