Omikhleia/markdown.sile

Closer look at inline HTML in Markdown

Omikhleia opened this issue · 0 comments

A generalization on #13 with elements of analysis and discussion...

  • blocks elements, except <hr>

    • Pandoc has extensions native_divs and markdown_in_html_blocks, both enabled by default
    • As of yet, these are unsupported extensions in Lunamark
    • Lunamark has writer.display_html which should work as in Pandoc with the above extensions disabled. (Should = I didn't test).
    • This encompasses a lot of things (incl. <table> for instance), most of which we cannot easily render in a satisfying way (even with the help of a 3rd party HTML parsing library, such as htmlparser hinted at in #18)
  • Block elements, the special case of <hr>

    • As it is supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of it...
  • Inline elements, except br and wbr

    • Inline Markdown is valid in them.
    • Pandoc has extension native_spans enabled by default, for <span> elements
    • With it, spans are transformed the equivalent bracketed_spans, respecting the structure (i.e. the content is below the Pandoc.Span element)
      $ pandoc -t json
       <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para",  "c":[{"t":"Span","c":[["",[],[]],[{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]}]]}]}]}
      
    • Without it, and for any other inline elements (e.g. <sup> etc.), the HTML is spit out, but flattened. I.e. the structure is lost, one gets at the same level the opening tag, the content and the closing tag
      $ pandoc -t json -f markdown-native_spans
      <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para","c":[{"t":"RawInline","c":["html","<span>"]},{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]},{"t":"RawInline","c":["html","</span>"]}]}]}
      
    • Lunamark has writer.inline_html which is technically equivalent to Pandoc's markdown-native_spans
  • Inline elements, the special cas of br and wbr

    • As they are supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of them...

Preliminary conclusions

  • Block elements are hard to reach...
    • Though <hr/> could be done more easily - but is it worth the effort and test anyway, as Markdown supports horizontal rules and we can even achieve nice things with them (#27)
  • Inline elements are hard to reach due to their "flattening" losing the hierarchy tree (... and reconstructing it is probably not a very clever approach) = We can't have e.g. <sup> working without much additional logic. The fact that they allow Markdown content also makes the use of an HTML parsing library very clumsy...
    • Though <span> could be supported by implementing native_spans in Lunamark - but is it worth the effort?
    • In all cases, <br> and <wbr> would be (decently) easy to support (#13)