Language Request: Kurdish Sorani (Central Kurdish)
makwanbarzan opened this issue · 1 comments
makwanbarzan commented
There's already a trained data file for the Latin dialect of the Kurdish language. Sorani dialect is the second most used dialect of the language and it'd be amazing to have a trained data file in Tesseract.
The script is Persian-like, except having a few different letters like ژ، گ، ڤ، چ، ۆ. So it shouldn't take so much effort to develop.
Thank you and I'm looking forward to getting a response.
stweil commented
All those characters are included in the script/Arabic
model. Maybe that already works for Sorani text?