meilisearch/charabia

Arabic script: Add an Normalizer removing Tatweel

ManyTheFish opened this issue · 0 comments

The Tatweel character is used for justification in Arabic script and doesn't add anything to the sense of the word.
The issue is that Meilisearch can't understand that a word with one or several Tatweel is perfectly the same as the same word without any Tatweel.
Adding a Normalizer ignoring Tatweel on the model of the one ignoring control characters would solve it.


Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝