Carriage returns disappear from translations
peurKe opened this issue · 1 comments
Hi,
I'm having a problem managing \r and \n CARRIAGE RETURN characters.
I want to translate sentences containing CARRIAGE RETURN characters that can be located right in the middle of sentences.
I use the following parameters to ask DeepL not to take CARRIAGE RETURN characters into account:
import deepl
translator = deepl.Translator('my_auth_key')
text = "Отпустите рюкзак\r\nи он автоматически\r\nвернется на место"
translation = translator.translate_text(
text,
source_lang=deepl.Language.RUSSIAN,
target_lang=deepl.Language.ENGLISH_AMERICAN,
formality=deepl.Formality.PREFER_LESS,
split_sentences=deepl.SplitSentences.NO_NEWLINES,
# preserve_formatting=True # Doesn't help, even when uncommented
).text
The meaning of the translations via the API with the DeepL python library are correct but I lose the CARRIAGE RETURN characters in the results (I have also tried with preserve_formatting=True but no better)
I tried with \n instead of \r\n but same results too.
text : 'Отпустите рюкзак\r\nи он автоматически\r\nвернется на место'
translation : 'Let go of the backpack and it will automatically return to its place'
text : 'Отпустите рюкзак\nи он автоматически\nвернется на место'
translation : 'Let go of the backpack and it will automatically return to its place'
Here's what I expect to happen:
text : 'Отпустите рюкзак r\nи он автоматически\r\nвернется на место'
translation : 'Let go of the backpack\r\nand it will automatically\r\nreturn to its place'
text : 'Отпустите рюкзак\nи он автоматически\nвернется на место'
translation : 'Let go of the backpack\nand it will automatically\nreturn to its place'
Am I doing something wrong?
Thanks in advance
The workaround I've found to deal with this is to use an xml tag and ignore it with the ignore_tags parameter to get <w><x>LF</x></w> placeholders that I can then replace with CARRIAGE RETURN characters::
translation = translator.translate_text(
text,
source_lang=deepl.Language.RUSSIAN,
target_lang=deepl.Language.ENGLISH_AMERICAN,
formality=deepl.Formality.PREFER_LESS,
split_sentences=deepl.SplitSentences.NO_NEWLINES,
tag_handling='xml', # Enable xml tags
ignore_tags=['x'], # Ignore <x> xml tag
).text
text : 'Отпустите рюкзак<w><x>LF</x></w>и он автоматически<w><x>LF</x></w>вернется на место'
translation : 'Let go of the backpack<w><x>LF</x></w>and it will automatically<w><x>LF</x></w>back in place'
Then replace all these <w><x>LF</x></w> placeholders with \n to finally get the result I was expecting:
translation : 'Let go of the backpack\nand it will automatically\nback in place'
But I'd like to avoid using xml tags and let DeepL take care of keeping the \r and \n CARRIAGE RETURN characters.
Is this possible, or am I doing something wrong?