BingLingGroup/autosub

Py-googletrans doesn't have a proper built-in translation text length limit

kumaranjeya opened this issue · 5 comments

Input is a subtitles file.

Translating text from "zh-cn" to "en".
Translation: 100% |####################################################################################| Time:  0:00:01
Error: Translation failed.

Already tried with other languages like French, Portuguese and Spanish which all working except Chinese Simplified giving error.

Translation failed message only happened when

if not translated_text or len(translated_text) != len(text_list):

If you don't mind, could you upload a failed subtitles file for me to test it out? It will be faster for me to figure out what is happening. I guess it's another bug.

Already tried with other languages like French, Portuguese and Spanish which all working except Chinese Simplified giving error.

I test it out. It seems the py-googletrans doesn't handle the case that a single translation text is too long. Though my program judge the length, it's still too long for the text containing full-wide char.

To be specific, at the beginning, I want to reduce as many translation requests as possible. So I combine multiple lines of subtitles text to a single big text per translation. Then I find the text length limit of a single request. According to py-googletrans, it has a limit of 15k. To be conservative and according to my common sense about the translate.google.com's 5000 text length limit, I set the size limit to 4000.

But somehow it's still too big for the text containing full-wide char. And seems weirder that after setting it to 2000 for full-wide char, it's still not that enough. So I set it to 1000 and it finally works. Now the program will judge whether a text has a full-wide char. If so, its size will count as four times as its length.

It may result in a slower translation procedure. If you want the translation faster, you can manually control the sleep time between two translation requests by input -slp option.

Commit f0b0ec3 should fix this issue. Thanks for your feedback.