jgm/djot

Inline container precedence with attributes

hellux opened this issue · 2 comments

hellux commented

In e.g. djot.js:

*{a="*"}

yields

<p><span a="*">*</span></p>

while

*{a="*"

yields

<p><strong>{a=&rdquo;</strong>&rdquo;</p>

The general inline precedence rule suggests that "the first opener that gets
closed takes precedence". In this case, however, the attributes have precedence
even though they are closed after the * is closed.

The first goal of djot is to allow parsing without backtracking, but is this
case really possible to parse this way without any backtracking? When
encountering the second *, we have to consider two possible outcomes:

  • the attributes are closed later and we should ignore the *, or
  • the attributes are not closed later and we must close the *.

How to handle this specific case isn't really specified in the syntax
reference. The djot.js behavior is probably more user-friendly than following the
general rule, as one would not expect quoted symbols to have impact. But from
an implementation point of view it seems difficult to prioritize attributes while
allowing arbitrary content in quoted attribute values without using backtracking.

Not sure whether it is intentional or not, but djot.js does not seem to allow
completely arbitrary content within the quotes, e.g.

*{a=[txt](url)

turns into

<p>*{a=&ldquo;[txt](url)</p>

instead of

<p>*{a=<a href="url">txt</a></p>
jgm commented

The rules you're quoting are said to cover "precedence for inline containers." The way I was thinking of it, that excludes things like code spans and attributes. These have precedence over inline containers. In a full spec, all of this would need to be spelled out more explicitly.

Maybe the other issue is now fixed? With the latest in main I'm getting:

% ./djot 
*{a=[txt](url)
<p>*{a=<a href="url">txt</a></p>
hellux commented

The rules you're quoting are said to cover "precedence for inline containers." The way I was thinking of it, that excludes things like code spans and attributes. These have precedence over inline containers. In a full spec, all of this would need to be spelled out more explicitly.

Yes, and this precedence is probably the better alternative, but it does requires backtracking to parse. You have to parse for attributes first, and if it fails you have to go back and parse everything again. But I guess it should still be parsable in linear time, it is not really possible to nest arbitrarily many attributes within each other (trying to open another quote/comment will close the previous one).

Maybe the other issue is now fixed? With the latest in main I'm getting:

% ./djot 
*{a=[txt](url)
<p>*{a=<a href="url">txt</a></p>

I think I accidentally typed the example without quotes. With quotes it still parses as text on the main branch:

% ./djot
*{a="[txt](url)
<p>*{a=&ldquo;[txt](url)</p>