Illegible words recognition in Persian lang
ImanX opened this issue · 2 comments
Summary:
I implemented tess-two and the .traineddata
imported in project as Persian language
tess-two work but that return Illegible words like:
ـاغ {.
٥ ج.: { ٠
٤ \ ٤2,
} 13
ؤ. …
« چ \ ة 8۱
:} 3 ١.٠
٠ ء,٬, "و ۱١ |
), ٠
} ( \ ق {۰
| } چ
د … ة ؛ ٠
؛ \ ؤ ٠٠
دغ٬ ؤ \ 3
حس {؛ | غ
3 ق : « }
دا ) { 3 د.
» < {:
٠ دێ .
؛ ,? 33٠ ,
{ -3 ٠_
{سم
Tess-two version: 5.4.1
Android version: 6.0
Phone/device model: Samsung S6
Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64): ARM64
@ImanX tess-two 5.4.1 is more than 3 years old, you should try latest version 9.0.0.
You might try asking on the Tesseract mailing list and including a sample input image so you can get suggestions about what image processing to do in order to get a better result. While your current result is clearly not what you're looking for, it does look like Tesseract is working as intended. Robyer's suggestion of trying a newer version is a good one too.