meilisearch/charabia

Implement Pinyin normalizer

Closed this issue · 0 comments

Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.

drawback

This normalization process doesn't seem to enhance the recall of Meilisearch.

enhancement

Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Mandarin, we should normalize Chinese characters into Pinyin using the pinyin crates.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝