jgm/djot

Relax disallowing multiple words in `code block` / `div` first line?

chrisjsewell opened this issue ยท 12 comments

Currently

```name
content
```

:::name
content
:::

goes to

  code_block text="content\n" lang="name"
  div class="name"
    para
      str text="content"

but

```name a
content
```

:::name a
content
:::

goes to

  para
    verbatim text="name a\ncontent\n"
  para
    str text=":::name a"
    soft_break
    str text="content"
    soft_break
    str text=":::"

This feels a little unintuitive to me. Is there a strong reason why this has to be the case?

Could the "additional" first-line content not be stored on the AST nodes?
It would then not be used in the standard HTML renderer,
but could be used by macros

jgm commented

Let's see what GFM does with it:

``` python more
hi
```

becomes

hi

and there is no trace of more in the rendered HTML. So this behavior is implementing a kind of standard. But you're right that we could, in principle, treat the rest as additional attributes. But how? Split by spaces and make them classes? What if there is punctuation not normally allowed in classes?

Let's see what GFM does with it

Commonmark stores the entire first line as info: https://spec.commonmark.org/dingus/?text=%60%60%60name%20a%0Acontent%0A%60%60%60%0A%0A

jgm commented

I've revised my note above.

Split by spaces and make them classes? What if there is punctuation not normally allowed in classes?

Do you need to split it at all? Just have the whole string be the lang

For div, hmmm; firstly I would ask, is there really a need to have this "store the first word as a class" semantics?

You already have block attributes for setting classes, why not just store ithe whole string under a key as well and be done with it ๐Ÿ˜„

(thanks as always for the rabid rapid replies!)

jgm commented

rabid ๐Ÿ•

Pandoc doesn't store these in the lang attribute; it adds them as classes. (This is the way it has always behaved, and changing it now is probably not a good idea.)

rabid ๐Ÿ•

๐Ÿ˜† ๐Ÿคฆโ€โ™‚๏ธ

It seems like Pandoc does not follow commonmark here ๐Ÿค” https://spec.commonmark.org/0.30/#info-string

jgm commented

Pandoc's commonmark and gfm and commonmark_x parsers will ignore the additional content in the case of code blocks. This could be modified to store the whole line in an info attribute, or perhaps to do so only if this content would differ from the class already stored.

Pandoc's markdown parser is different. Part of the motivation here is to avoid confusing inline code that happens to start at the beginning of the line and uses three backticks with a code block.

Part of the motivation here is to avoid confusing inline code that happens to start at the beginning of the line and uses three backticks with a code block.

It feels like, if you have "committed" to writing three backticks at the start of the line, then you are expecting to write a code block.
I can't imagine there being any time that you actually want this as inline?
Note, commonmark prohibits backticks being in the info string (then it is parsed as inline), so you can still write inline:

```inline something```

just not

```inline something
```
jgm commented
``` ``Markdown code spans with ` inside them`` ``` can be quoted with ` ``` `.

Let's see how GH renders it:

``Markdown code spans with ` inside them`` can be quoted with ```.