mozilla/pontoon

RTL translations that start with LTR characters may be mis-rendered

Opened this issue · 0 comments

As discussed on #l10n-community:

@bcolsson:

I have a question regarding how to best handle a bidirectional bug I'm encountering for a new RTL locale (Saraiki) in Firefox. There's a string that's displaying incorrectly in the browser and I believe its being caused by the placeholder starting at the beginning of the string. My assumption is that since the string is starting with LTR text the bidi algorithm considers it an LTR string - causing the placeholder to appear on the left, while the remaining RTL text appears aftewards.

Is this something best fixed on the backend, or is there a special character that can be added in Pontoon at the beginning of the string that can be used to flag the whole string as RTL?

@eemeli:

You're looking for the right-to-left mark: https://www.compart.com/en/unicode/U+200F

@ItielMaN:

What @eemeli said. As a rule of thumb I'm always starting a string with an RLM, when I know it's going to start with an LTR character.
At least on the Hebrew keyboard, Alt+) inserts this character. This works on Windows 8+, and I'm told it also works on Linux. Not sure about macOS.
Unfortunately I don't think this works on other locales... I did find this though, it may be helpful:
https://stackoverflow.com/questions/48487642/are-there-right-to-left-versions-of-unicode-characters-for-period-exclamation-m

In general I consider using RLM/LRM a hacky method, but it works every single time.

@eemeli:

I wonder if it might make sense for Pontoon to always insert an RLM at the start of translations when:

  1. The target locale's script is RTL,
  2. The first strongly directional character is LTR, and
  3. The translation includes at least some RTL characters.

Is there any situation in which doing so would produce a worse result than currently?

@ItielMaN:

This makes sense, but there could be rare edge cases I can't think of right now that may make this worse.
Besides, what would happen if the user regrets a translation suggestion and hits Ctrl+A + backspace to remove all of the translation, and re-words the translation? Assuming you mean to add RLM to the textbox, the RLM would get lost that way.
And if you're suggesting to add RLM only AFTER the user submits a suggestion, this could potentially add another RLM if the user already added it from muscle memory. Though I guess you can detect that too, and remove it post-submission if 2 consecutive identical markers are detected...

These kinds of logics are fine as long as the user is aware of them, but I don't think localizers would know what happens behind the scenes.
Something to think about that could satisty all needs would be to maybe add a dropbox to the editor somewhere to append a special character(s) (also for other locales; I don't know what others may need besides RLM/LRM), and have the editor detect that and show them in the textbox, like this:
https://tomer.github.io/pilcrow/
But this sounds like a long-term project...

@eemeli:

I was more thinking of the leading RLM as a feature of the storage format, so it'd be completely absent from the string when it's being edited in Pontoon. In Pontoon, we already present RTL locales with the editor having dir=rtl set on it, and in many places in the UI that'll be true as well, so even all-English content may get rendered with a right-to-left paragraph direction.

If an RLM is manually included at the start of a string, then my condition 2 above would not apply, and so we wouldn't add a second one. But we would probably leave it out when displaying. So if you submit <RLM>foo [RTL content], it'd get stored as <RLM>foo [RTL content], but shown later in Pontoon as foo [RTL content], but editing that to bar [RTL content] and submitting, we'd store <RLM>bar [RTL content].

@ItielMaN:

ah I see, you're right. I think this is worth experimenting on, but I still think the user should be somehow aware this is happening to the string they are working on, as this could lead to confusions.
Also, how could the user opt out from this, in the rare edge cases I can't think of right now, in which the RLM would do more harm than good?

@eemeli:

The first idea that comes to mind is that if the string starts with an explicit LRM, then we never take it away and never add a RLM before it.

As for user notification, we could show some sort of indicator in the UI when the string being edited fits the criteria, and maybe include there info about how to disable the RLM insertion.