A Rust implementation of minbpe.
Tokenizer is a tool that breaks down text into smaller units called tokens. It is commonly used in natural language processing tasks such as text classification, sentiment analysis, and machine translation. This tokenizer is inspired by the minbpe project.
cargo run -- aaabdaaabac
Expected Result
Encoded: [258, 100, 258, 97, 99]
Decoded: "aaabdaaabac"