
Add Filipino lang

JohnHenryGaspay opened this issue · 7 comments

Would it be also good if you guys can support filipino language here in the Philippines.

@theraysmith I've noticed that there is no Filipino language on the list of data.

My training text corpus does not distinguish between fil and tgl, while they show up in ISO-639-2T as distinct. For some reason that I can't remember now, the language code has switched from tgl to fil in the "best" models that I pushed recently.

Does the fil language do what you want?
If not please try to explain why.
You could also try Latin, which attempts to cover all latin-based languages.

@amitdo I've tried adding it to the language folders but when selecting fil as language the app always shut down.

@theraysmith Yes our national language here in the Philippines is Filipino(fil) and tagalog(tgl) is the old name for that. I've tried the Latin but it's not working.

I tested just now, with both best/fil and tgl (4.00.00alpha traineddatas) and they work with tesseract built from latest github code.

 tesseract fil-test.png fil-test-best-fil --oem 1 --psm 6 -l best/fil --tessdata-dir ../

 tesseract fil-test.png fil-test-tgl --oem 1 --psm 6 -l tgl  --tessdata-dir ../

Files attached. To me best/fil seems more accurate.
I took a snapshot from tgl wikipedia page.


I've tried adding it to the language folders but when selecting fil as language the app always shut down.

You should try running Tesseract from the command-line.