meilisearch/charabia

Implement Vietnamese tokenizer for Meilisearch

kimyvgy opened this issue · 2 comments

Hello. It seems Meilisearch doesn’t have the tokenizer for Vietnamese, does It? I would like to implement a tokenizer for Vietnamese.

I'm waiting for the Vietnamese tokenizer :D

Hello @kimyvgy and @anle-ct,
If you have any idea about Rust Library that could enhance the Vietnamese Language support I'd be interested for some feedback about them.

This repository is really open to contribution, for instance, another contributor is currently implementing a Khmer Language segmenter, so don't hesitate to do the same for your own Language, I'd be pleased to help you in your work! I put below the link to the contributing file where you can find some tutorials for implementing a specialized normalizer or a segmenter:
https://github.com/meilisearch/charabia/blob/main/CONTRIBUTING.md