Question about other languages support
Closed this issue · 1 comments
Thanks for open source. I want to test the recognition effect of Chinese text of the model. What changes should I make?
Hi,
It requires two small changes, but of course that you need to re-train the models and create a Chinese text dataset.
As for the changes, you should create a .txt
file that contains your character set (see example in data/charset_36.txt
).
Then, in the configuration files, you should change charset_path
under the dataset
section to be path_to_your_charset
.
Now, you need to train the models. To maximize performance, SemiMTR requires 3 training stages, as described here. For pretraining the language model, you should create a text dataset and can use this notebook.
Please let me know if there are any further questions.