commonmark/commonmark-spec

Parse link references without knowledge of definitions

chrisjsewell opened this issue · 3 comments

Heya, I would like to understand the rationale behind https://spec.commonmark.org/0.30/#example-568

[foo][bar][baz]

[baz]: /url
<p>[foo]<a href="/url">bar</a></p>

This enforces on both users and parsers that [foo][bar][baz] cannot be "understood" in isolation, but only after all definitions within the document have been identified.

Particularly for parsers (such as markdown-it and remark), this necessitates a bunch of extra complexity to run a "pre-parse", before one can actually parse the document in full.
In turn, it precludes any kind of streamed or incremental parsing, or to write a good regex based syntax highlighter (such as TextMate grammars)

I feel the output of this example should be:

<p>[foo][bar]<a href="/url">baz</a></p>

or even just

<p><a href="/url">baz</a></p>

i.e.

  1. There would be a full parse, during which both [foo][bar] and [baz] are captured as link references in the AST.
  2. During this parse all definitions are also captured
  3. On conversion to HTML, when encountering the [foo][bar] link reference, with no matched definition, it would be output in its raw (encoded) format, or even just omitted.

Is there any rationale to Example 568 that I am missing?


In fact, the syntax highlighting, here on GitHub, demonstrates exactly the problem, in that it cannot "work out" what is a link reference, and incorrectly highlights [foo]:

image

jgm commented

This is exactly the point I made here: https://johnmacfarlane.net/beyond-markdown.html#reference-links
It is one of a number of things I would have done differently if we were not constrained by compatibility with existing markdown behavior.

jgm commented

Oh, and as I say in the article: changing things so that links can be recognized without parsing the whole document means no more "shortcut" links, e.g. [foo]. (Unless you want to recognize everything of that form as a link, which then requires escaping of every literal [ character.) I think many markdownists would regard this as a heavy cost.

Thanks for the link @jgm that's really interesting, and glad to know that I was completely alone in feeling this 😅

if we were not constrained by compatibility with existing markdown behavior.
I think many markdownists would regard this as a heavy cost.

So, I guess my question would be; do we have to forever be constrained by legacy, or is there any world where this could have some form of spec compliance 😬

Let me be clear up front that I’m not suggesting any change in the goals of the Commonmark project. If these reflections lead to anything, it should probably be an entirely new project under a new name.

Did you ever look into getting any "consensus" over your proposals?