commonmark/commonmark-spec

Bidirectional text support

mskf1383 opened this issue · 10 comments

Supporting for RTL text is a big problem for RTL languages speakers (Persian, Arabic, etc.).
This can easily fix with adding a dir="auto" to each line. Github currently do this:

A LTR text

یک متن RTL

<p dir="auto">the text</p>

Please add this to spec.
Thanks!

jgm commented

Adding a dir="auto" attribute to p elements (and presumably h1, td, li, and other elements that contain text) is really a rendering issue, not a parsing issue, so this is not really an issue for the spec, but rather for implementations.

Btw: can't the dir="auto" be placed on the html element so that it affects the whole document?

If so, this is really a templating issue.

It’s complex.

can't the dir="auto" be placed on the html element so that it affects the whole document?

That is possible if the markdown is assumed to have the same directionality as the whole page that hosts it. For content from unknown authors, such as comments, it won’t work, because the page that hosts it could be, say, in English (LTR), while content is in something else.

For why setting dir=auto on a wrapping div wouldn’t work, see the first note in the HTML spec on auto:

The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). […]

That is to say, the first character that isn’t neutral (e.g., @), defines the direction for everything.
So if you want to let users mix RTL and LTR languages, the best way to go is apparently to add dir=auto to all deepest elements that contain text.
In my previous testing, I believe GH adds dir=auto on: 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'p', 'ol', 'ul'. I am not sure why they choose to do lists instead of lis btw.

Still, this isn’t perfect. See:

Finally, I completely agree with @jgm that this isn’t something CommonMark needs to do. Particularly because it doesn’t concern itself with extensions at rendering level.
Also, because there isn’t one great way to do it, so what it would do, would be wrong for some users.
And the alternatives (postprocessing the HTML, by the user or by an implementation) is viable.

jgm commented

Why won't a wrapping div work, I don't quite follow. Are you saying that if you have

<div dir="auto">
<p>Hello</p>
</div>

the first character with strong directionality will be < rather than H? That seems very crude indeed if that's how it works.

Yep, it is very crude!
To clarify: the first character that has any directionality (so not @ or . or so), e.g., H (LTR), will define the directionality of everything.

So, for:

<div dir="auto">
<p>Hello</p>
<p>أنتِ</p>
</div>

^-- everything will be LTR, because the H is LTR.

To clarify: the first character that has any directionality (so not @ or . or so), e.g., H (LTR), will define the directionality of everything.

This is why dir="auto" should add to every tag.

jgm commented

This is why dir="auto" should add to every tag.

Nothing prevents a conforming implementation from doing this. It concerns rendering, so it's not something that affects the spec. So, closing this.

This is why dir="auto" should add to every tag.

Not actually. It doesn't work if there be nesting elements. I am working on this to provide a proper guideline on how to implement this in proper way.

By the way, thanks for bringing this up here. I tried to follow up on CommonMark community forum but no response I got. Also since after years my account is still a new user, I cannot discuss it more there since I have reached to the limit.

I request you to mention bidi rather rtl. This is not an RTL issue.

At least now we know that from CommonMark team, this is seen as implementation issue not parsing spec. This is something important.

Looking forward to your guidelines!

Btw, at least speaking for myself—but perhaps also @jgm as they opened an issue for this in Pandoc—I’m interested in improving the situation for bidi/RTL/etc users :)