Issues
- 5
Question about Encoder Logic
#87 opened by JackxTong - 0
One problem in the annotations of `test_wikipedia_example` in the `tests/test_tokenizer` file
#93 opened by donglinkang2021 - 0
LLM is worse at non-English languages
#92 opened by 7CD - 2
Interface to remove Unseen or rare encoding paths from downloaded models given new dataset
#36 opened by cmollgaard - 0
- 5
- 2
Instead of finding the one pair with the highest frequency and merging it at each step, do the highest N pairs
#69 opened by hippietrail - 0
LLM as calc
#81 opened by michaelshekasta - 0
OSS-Fuzz Integration
#80 opened by ennamarie19 - 3
- 0
BPE in Haskell
#79 opened by BobMcDear - 3
The regular expressions break all scripts with combining marks in the middle of the syllable
#73 opened by ajaykg - 0
What to support GPT-4O tokenizer?
#77 opened by echo-valor - 0
Notebook Issue In Google Colab
#74 opened by kelixirr - 0
- 2
- 3
- 0
- 0
Would using prompts that contain concatenated words to reduce token count negatively affect results
#61 opened by hatgit - 1
Optimizing minbpe to also support video tokenization (extract low-dimensional latent patches from video frames)
#48 opened by Jaykef - 11
Faster BPE
#5 opened by zouharvi - 0
- 0
"regex.py" file name conflict
#59 opened by mogomaa79 - 16
Alternative to bpe
#50 opened by marcov-dart - 0
_
#57 opened by momonga-ml - 1
- 4
Minbpe as a potential course
#19 opened by ViswanathaReddyGajjala - 1
- 0
- 0
A thanks from self-learners community
#45 opened by IamExperimenting - 2
Vectorized BasicTokenizer.train?
#29 opened by kuprel - 0
Byte2Byte Tokenizer
#37 opened by loretoparisi - 3
`regex.py` is feature-engineering and will probably degrade performance when scaling to many languages
#31 opened by domschl - 3
- 5
Steal token visualisation code
#11 opened by hauntsaninja - 0
What is the difference about the bbpe vocab decode method in minbpe against huggingface transformers?
#15 opened by lovekittynine - 4
Saving/Loading tokenizer from disk
#2 opened by cyrilzakka - 2
Loading data from disk partially
#8 opened by kathir-ks