/minbpe.c

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.

Primary LanguageC

minbpe.c

minbpe.c is a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C. The project is inspired by minbpe by @kapathy