Issues
- 1
- 0
`into_tokenizer` don't work with stop_words
#319 opened by ManyTheFish - 11
latin-camelcase feature make wrong segmentation
#289 opened by hamano - 3
- 7
- 0
Update wana_kana to 4.0.0
#313 opened by tats-u - 13
- 6
- 0
Add Math symbols in the default separator list
#300 opened by ManyTheFish - 6
- 0
Normalize "œ" / "æ" into "oe" / "ae"
#268 opened by ManyTheFish - 0
Rework Chinese Pinyin normalizer
#285 opened by ManyTheFish - 5
Cross-compiling charabia for arm
#265 opened by chiru-arh - 3
- 1
Tag and release new version?
#276 opened by 6543 - 0
- 7
Ð vs Đ differentiate
#245 opened by ngdbao - 7
- 0
Compilation warnings when not using default features
#260 opened by timvisee - 0
Compiler failure without vietnamese feature
#258 opened by timvisee - 0
Fix kvariant CI
#254 opened by ManyTheFish - 2
- 0
Add Khmer support information to README
#246 opened by curquiza - 0
add support for khmer language
#200 opened by xshadowlegendx - 0
- 0
- 2
Chinese segmentation not correct
#226 opened by sivdead - 7
Upgrade dependencies
#151 opened by curquiza - 2
Implement Vietnamese tokenizer for Meilisearch
#174 opened by kimyvgy - 3
Publish hfhchan kvariants wrapper on crates.io
#184 opened by ManyTheFish - 9
Arabic script: Implement specialized Segmenter
#133 opened by ManyTheFish - 0
Fix compilation without `greek` feature enabled
#201 opened by akeamc - 2
Create a CI keeping kvariants up-to-date
#185 opened by ManyTheFish - 8
- 0
Arabic script: Add an Normalizer removing Tatweel
#186 opened by ManyTheFish - 0
Greek script: Normalize Sigma
#179 opened by ManyTheFish - 0
Greek Script: Normalize accents
#178 opened by ManyTheFish - 0
clippy CI fail with latest toolchain
#176 opened by choznerol - 0
- 5
- 2
Enhance Chinese normalizer by unifying `Z`, `Simplified`, and `Semantic` variants
#144 opened by ManyTheFish - 0
Refactor normalizers
#156 opened by ManyTheFish - 8
Implement a Japanese specialized Normalizer
#131 opened by ManyTheFish - 0
Korean support
#153 opened by qbx2 - 2
Add an allowlist to the tokenizer builder
#132 opened by ManyTheFish - 0
Move the FST based Segmenter in a standalone file
#130 opened by ManyTheFish - 2
- 1
Implement Jyutping normalizer
#134 opened by ManyTheFish - 0
Implement Pinyin normalizer
#135 opened by ManyTheFish - 2
Upgrade Whatlang dependency
#141 opened by ManyTheFish