Implement Pinyin normalizer
Closed this issue · 0 comments
Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.
drawback
This normalization process doesn't seem to enhance the recall of Meilisearch.
enhancement
Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Mandarin, we should normalize Chinese characters into Pinyin using the pinyin crates.
Files expected to be modified
Misc
related to product#503
Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement aSegmenter
or aNormalizer
.
Thanks a lot for your Contribution! 🤝