Arabic script: Add an Normalizer removing Tatweel
ManyTheFish opened this issue · 0 comments
ManyTheFish commented
The Tatweel character is used for justification in Arabic script and doesn't add anything to the sense of the word.
The issue is that Meilisearch can't understand that a word with one or several Tatweel is perfectly the same as the same word without any Tatweel.
Adding a Normalizer ignoring Tatweel on the model of the one ignoring control characters would solve it.
Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement aSegmenter
or aNormalizer
.
Thanks a lot for your Contribution! 🤝