/TokenizationBenchmarks

Comparison of various supervised and unsupervised tokenization algorithms on a Chinese corpus

Primary LanguagePython

Watchers