amazon-science/semimtr-text-recognition

Question about other languages support

Closed this issue · 1 comments

Thanks for open source. I want to test the recognition effect of Chinese text of the model. What changes should I make?

Hi,
It requires two small changes, but of course that you need to re-train the models and create a Chinese text dataset.

As for the changes, you should create a .txt file that contains your character set (see example in data/charset_36.txt).
Then, in the configuration files, you should change charset_path under the dataset section to be path_to_your_charset.

Now, you need to train the models. To maximize performance, SemiMTR requires 3 training stages, as described here. For pretraining the language model, you should create a text dataset and can use this notebook.

Please let me know if there are any further questions.