Is it possible to train multiple languages on a one model file ?
lomograb opened this issue · 6 comments
lomograb commented
Is it possible to train multiple languages on a one model file ?
amitdo commented
Yes, but each script should be in a separate line.
amitdo commented
For mixed scripts in the same line see this paper:
https://www.researchgate.net/publication/280777013_A_Sequence_Learning_Approach_for_Multiple_Script_Identification
lomograb commented
Does CLSTM support this (mixed scripts in the same line) ?
amitdo commented
It's not supported out-of-the-box, but you can implement what's described in that paper with clstm.
lomograb commented
Thank you @amitdo for replying and this great project too. Okay, going to close this issue
mittagessen commented
As a note there is a model for doing the script identification exactly as described in the article (arrived upon independently) at kraken-models. It is able to differentiate between Arabic, Syriac, Cyrillic, Greek, Latin, and Fraktur.