karpathy/minbpe

Byte2Byte Tokenizer

loretoparisi opened this issue · 0 comments

Implement a "token-free" or tokenization free encoder to work at Unicode/UTF-8 character-level.

Examples