Byte Level Subwords

This repository aims to recreate the commonly used Byte Pair Encoding and Byte level Byte Pair Encoding algorithms.

Community Support

We value community involvement and welcome your support for this project:

  • Issues: Report any bugs or suggest improvements by opening an issue on GitHub.
  • Feature Requests: Share your ideas for additional features through GitHub discussions.
  • Pull Requests: Contribute directly to the codebase by submitting a pull request aligned with the project's goals.
  • Spread the Word: Help us reach a broader audience by sharing this project on social media and with colleagues and friends.


Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.
