Closer look at inline HTML in Markdown
Omikhleia opened this issue · 0 comments
Omikhleia commented
A generalization on #13 with elements of analysis and discussion...
-
blocks elements, except
<hr>
- Pandoc has extensions
native_divs
andmarkdown_in_html_blocks
, both enabled by default - As of yet, these are unsupported extensions in Lunamark
- Lunamark has
writer.display_html
which should work as in Pandoc with the above extensions disabled. (Should = I didn't test). - This encompasses a lot of things (incl.
<table>
for instance), most of which we cannot easily render in a satisfying way (even with the help of a 3rd party HTML parsing library, such ashtmlparser
hinted at in #18)
- Pandoc has extensions
-
Block elements, the special case of
<hr>
- As it is supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of it...
-
Inline elements, except
br
andwbr
- Inline Markdown is valid in them.
- Pandoc has extension
native_spans
enabled by default, for<span>
elements - With it, spans are transformed the equivalent
bracketed_spans
, respecting the structure (i.e. the content is below the Pandoc.Span element)$ pandoc -t json <span>content _italic_</span> {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para", "c":[{"t":"Span","c":[["",[],[]],[{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]}]]}]}]}
- Without it, and for any other inline elements (e.g.
<sup>
etc.), the HTML is spit out, but flattened. I.e. the structure is lost, one gets at the same level the opening tag, the content and the closing tag$ pandoc -t json -f markdown-native_spans <span>content _italic_</span> {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para","c":[{"t":"RawInline","c":["html","<span>"]},{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]},{"t":"RawInline","c":["html","</span>"]}]}]}
- Lunamark has
writer.inline_html
which is technically equivalent to Pandoc'smarkdown-native_spans
-
Inline elements, the special cas of
br
andwbr
- As they are supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of them...
Preliminary conclusions
- Block elements are hard to reach...
- Though
<hr/>
could be done more easily - but is it worth the effort and test anyway, as Markdown supports horizontal rules and we can even achieve nice things with them (#27)
- Though
- Inline elements are hard to reach due to their "flattening" losing the hierarchy tree (... and reconstructing it is probably not a very clever approach) = We can't have e.g.
<sup>
working without much additional logic. The fact that they allow Markdown content also makes the use of an HTML parsing library very clumsy...- Though
<span>
could be supported by implementingnative_spans
in Lunamark - but is it worth the effort? - In all cases,
<br>
and<wbr>
would be (decently) easy to support (#13)
- Though