mathewthe2/Game2Text

Make other language support more obvious

artjomsR opened this issue · 2 comments

This tool is correctly advertised as working for all languages but out of the box works only with Japanese and it's not obvious how to use it for other languages. This will make the tool more accessible to all language learners. Suggested changes:

  1. Add an option to UI settings to select a language for OCR

OR

  1. Add documentation to make it more obvious how the user can do the same manually. Here's my attempt:
    In config.ini, change values according to https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html and replace them in these lines
tesseract_language = jpn
ocr_space_language = jpn

Download XYZ.traineddata for your language from https://github.com/tesseract-ocr/tessdata_best/ (OR https://github.com/tesseract-ocr/tessdata) and put it in the game2text\resources\bin\win\tesseract\tessdata folder

How does one actually use it for other languages?

@drewboardman You should be able to follow the instructions in my comment above (after Here's my attempt: part). This worked for me with non-Japanese language at the time of writing the comment