Leave the content of `<code>` and related HTML elements untouched by CommonMark parser

Question

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser

kaushalmodi opened this issue 2 years ago · 15 comments

Hello,

Recently I discovered that CommonMark allows Markdown parsing within <code> blocks!

So if user had something like <code>**bold**</code> in their markdown content (which is analogous to Markdown `**bold**`), the CommonMark parser would parse those asterisks in there.

This can be reproduced at least with this CommonMark dingus

Can the spec be updated so that the content inside <code> (and also <kbd>, <samp> and <var>) also be treated verbatim like that done for the <pre> element?

<code> element

displays its contents styled in a fashion intended to indicate that the text is a short fragment of computer code

Now, if that short fragment happens to have Markdown markup characters, we don't want a markdown parser to render those as Markdown!
<kbd> element

represents a span of inline text denoting textual user input from a keyboard, voice input, or any other text entry device

The textual user input on the keyboard can contain asterisks, underscores and square-brackets too. We wouldn't want a Markdown parser to interpret those!
<samp> element

used to enclose inline text which represents sample (or quoted) output from a computer program.

What if the computer program is outputting Markdown text.. we want this element to show exactly what the computer program's output was; we won't want a Markdown parser to corrupt the sample of the output which the user is trying to preserve in a <samp> element.
<var> element

represents the name of a variable in a mathematical expression or a programming context

Mathematical expressions easily contain asterisks. We don't want the Markdown parsers to touch these elements either!

/cc: @jmooring

Answer 1 · 2022-05-22T18:46:12.000Z

Hmm, if you want code, use markdown code: backticks.
I’ve seen folks depend on these “bugs” (imo features) so I don’t think it should be changed

Answer 2 · 2022-05-22T18:51:46.000Z

use markdown code: backticks.

I could have used the backticks, but I needed to handle cases where I can add my custom HTML annotations to the inline code blocks.

Here's a watered down example:

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

Another usecase is to do syntax highlighting using classes in code tags.. here's a screenshot of what I mean

(note the inline colored code in there)

Right now, I am doing this by bypassing a commonmark parser.

Answer 3 · 2022-05-22T18:55:07.000Z

@wooorm

seen folks depend on these “bugs”

Can you point to a usecase where people would want a markdown parser to render stuff inside <code> blocks?

Answer 4 · 2022-05-22T19:09:09.000Z

To add links to code for example

Answer 5 · 2022-05-22T19:13:04.000Z

This is the way Markdown has traditionally done it, starting with Markdown.pl.

https://babelmark.github.io/?text=%3Ccode%3E%0A**a**%0A%3C%2Fcode%3E%0A

Answer 6 · 2022-05-22T19:14:27.000Z

@wooorm I don't follow.. the code element is for inline code. The current CommonMark behavior is inconsistent between block code (<pre> elements) and inline code (<code> elements). If the user meant to have Markdown links, they might as just put those outside the HTML code elements.

Answer 7 · 2022-05-22T19:16:28.000Z

@jgm

This is the way Markdown has always done it:

I understand.. but may be it's our opportunity to fix that? I cannot find anyway in Markdown that I can write inline code with HTML attributes like so

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

Answer 8 · 2022-05-22T19:18:50.000Z

A strong degree of compatibility with existing implementations was a design goal.

I do see why this prevents you from doing what you want to do here. It's not really a problem in pandoc, for example, where you can just do

`echo "hello"`{.inline-src language-nim lang=nim}

or use the raw attribute

`<code class="inline-src language-nim" data-lang="nim">echo "hello"</code>`{=html}

But there does seem to be an expressive gap here in core commonmark.

Answer 9 · 2022-05-22T19:31:08.000Z

they might as just put those outside the HTML code elements.

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

Answer 10 · 2022-05-22T20:07:34.000Z

@jgm Unfortunately I am not using pandoc. I am using the Go Commonmark parser called Goldmark through Hugo (static site generator).

@wooorm

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

Note that will will affect only inline code. You cannot do those anyways in block code blocks or <pre> blocks. If at all, this will bring consistency between inline and block code blocks.

Again, an example of this in wild will be useful.

Answer 11 · 2022-05-22T20:12:06.000Z

Bringing complete consistency is impossible: HTML in markdown is a black box. It “sniffs” things that look like XML and switches to a different state based on it starting with <style or so. It’s not an actual complete parser. Some consistency could be added but as mentioned above the compatibility is important.

Some examples: <code>c + *y*</code>, to emphasise y, and <code>[myId](#some-href) := someProduction</code>

Answer 12 · 2022-05-22T20:14:21.000Z

In HTML, <pre> is for preformatted content (i.e. special treatment of whitespace), but it can still contain other elements; <code> isn’t restricted like that at all. So why should it be in Markdown?

Answer 13 · 2022-11-26T05:26:13.000Z

Hi there, we just noticed our Graphviz docs (hugo/goldmark-based) have smart quotes, breaking copy/paste (Graphviz docs issue), where we're using <code> blocks instead of backticks so that we can link sections inside the code block:

<code>[mode](/docs/attrs/mode/)="hier"</code>

Shows these smart quotes, unintentionally:

We seem to be stuck between two bad places: we seem to be relying on some markdown processing inside our code tag (to generate the <a> tag), but don't want the smart-quotes processing. We're caught between disabling links inside our code sections, or disabling the smart-quotes feature entirely, even outside of code blocks, or try to use two code sections (but that causes excessing padding issues). None of these options are appetising! Could anyone suggest another workaround that lets us keep intra-code-block links?

Answer 14 · 2022-11-26T05:34:57.000Z

@mhansen you both want and don't want markdown processing inside the code span.
I'd suggest a different approach. A simple one would be

<code>[mode](/docs/attrs/mode/)</code>`="hier"`

Or even better

[`mode`](/docs/attrs/mode/)`="hier"`

Answer 15 · 2022-11-26T05:47:34.000Z

Thanks for the workaround. We'll need to remove left & right padding and the code-block rounded corners from our code blocks so they don't have extra padding and rounded corners, like the bottom here:

We can probably get away without the padding and rounded corners, though it won't look as nice, we might end up doing that:

FWIW, I'd be very happy to give up all markdown parsing in <code> blocks; it's very easy to write out the <code><a href="...">foo</a>=bar</code> but, I think, tougher to make two side-by-side code blocks look nice in CSS.