commonmark/commonmark-spec

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser

kaushalmodi opened this issue · 15 comments

Hello,

Recently I discovered that CommonMark allows Markdown parsing within <code> blocks!

So if user had something like <code>**bold**</code> in their markdown content (which is analogous to Markdown `**bold**`), the CommonMark parser would parse those asterisks in there.

This can be reproduced at least with this CommonMark dingus

image

Can the spec be updated so that the content inside <code> (and also <kbd>, <samp> and <var>) also be treated verbatim like that done for the <pre> element?

  • <code> element

    displays its contents styled in a fashion intended to indicate that the text is a short fragment of computer code

    Now, if that short fragment happens to have Markdown markup characters, we don't want a markdown parser to render those as Markdown!

  • <kbd> element

    represents a span of inline text denoting textual user input from a keyboard, voice input, or any other text entry device

    The textual user input on the keyboard can contain asterisks, underscores and square-brackets too. We wouldn't want a Markdown parser to interpret those!

  • <samp> element

    used to enclose inline text which represents sample (or quoted) output from a computer program.

    What if the computer program is outputting Markdown text.. we want this element to show exactly what the computer program's output was; we won't want a Markdown parser to corrupt the sample of the output which the user is trying to preserve in a <samp> element.

  • <var> element

    represents the name of a variable in a mathematical expression or a programming context

    Mathematical expressions easily contain asterisks. We don't want the Markdown parsers to touch these elements either!

/cc: @jmooring

Hmm, if you want code, use markdown code: backticks.
I’ve seen folks depend on these “bugs” (imo features) so I don’t think it should be changed

use markdown code: backticks.

I could have used the backticks, but I needed to handle cases where I can add my custom HTML annotations to the inline code blocks.

Here's a watered down example:

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

Another usecase is to do syntax highlighting using classes in code tags.. here's a screenshot of what I mean

image

(note the inline colored code in there)

Right now, I am doing this by bypassing a commonmark parser.

@wooorm

seen folks depend on these “bugs”

Can you point to a usecase where people would want a markdown parser to render stuff inside <code> blocks?

To add links to code for example

jgm commented

This is the way Markdown has traditionally done it, starting with Markdown.pl.

https://babelmark.github.io/?text=%3Ccode%3E%0A**a**%0A%3C%2Fcode%3E%0A

@wooorm I don't follow.. the code element is for inline code. The current CommonMark behavior is inconsistent between block code (<pre> elements) and inline code (<code> elements). If the user meant to have Markdown links, they might as just put those outside the HTML code elements.

@jgm

This is the way Markdown has always done it:

I understand.. but may be it's our opportunity to fix that? I cannot find anyway in Markdown that I can write inline code with HTML attributes like so

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.
jgm commented

A strong degree of compatibility with existing implementations was a design goal.

I do see why this prevents you from doing what you want to do here. It's not really a problem in pandoc, for example, where you can just do

`echo "hello"`{.inline-src language-nim lang=nim}

or use the raw attribute

`<code class="inline-src language-nim" data-lang="nim">echo "hello"</code>`{=html}

But there does seem to be an expressive gap here in core commonmark.

they might as just put those outside the HTML code elements.

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

@jgm Unfortunately I am not using pandoc. I am using the Go Commonmark parser called Goldmark through Hugo (static site generator).

@wooorm

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

Note that will will affect only inline code. You cannot do those anyways in block code blocks or <pre> blocks. If at all, this will bring consistency between inline and block code blocks.

Again, an example of this in wild will be useful.

Bringing complete consistency is impossible: HTML in markdown is a black box. It “sniffs” things that look like XML and switches to a different state based on it starting with <style or so. It’s not an actual complete parser. Some consistency could be added but as mentioned above the compatibility is important.

Some examples: <code>c + *y*</code>, to emphasise y, and <code>[myId](#some-href) := someProduction</code>

In HTML, <pre> is for preformatted content (i.e. special treatment of whitespace), but it can still contain other elements; <code> isn’t restricted like that at all. So why should it be in Markdown?

Hi there, we just noticed our Graphviz docs (hugo/goldmark-based) have smart quotes, breaking copy/paste (Graphviz docs issue), where we're using <code> blocks instead of backticks so that we can link sections inside the code block:

<code>[mode](/docs/attrs/mode/)="hier"</code>

Shows these smart quotes, unintentionally:

image

We seem to be stuck between two bad places: we seem to be relying on some markdown processing inside our code tag (to generate the <a> tag), but don't want the smart-quotes processing. We're caught between disabling links inside our code sections, or disabling the smart-quotes feature entirely, even outside of code blocks, or try to use two code sections (but that causes excessing padding issues). None of these options are appetising! Could anyone suggest another workaround that lets us keep intra-code-block links?

jgm commented

@mhansen you both want and don't want markdown processing inside the code span.
I'd suggest a different approach. A simple one would be

<code>[mode](/docs/attrs/mode/)</code>`="hier"`

Or even better

[`mode`](/docs/attrs/mode/)`="hier"`

Thanks for the workaround. We'll need to remove left & right padding and the code-block rounded corners from our code blocks so they don't have extra padding and rounded corners, like the bottom here:

image

We can probably get away without the padding and rounded corners, though it won't look as nice, we might end up doing that:

image

FWIW, I'd be very happy to give up all markdown parsing in <code> blocks; it's very easy to write out the <code><a href="...">foo</a>=bar</code> but, I think, tougher to make two side-by-side code blocks look nice in CSS.