github/cmark-gfm

Don't use empty attribute syntax in HTML renderer

karen-arutyunov opened this issue · 0 comments

Currently, rendering footnotes into HTML ends up with element attributes, which are specified using the empty attribute syntax. For example:

<p>Foo bar.<sup class="footnote-ref"><a href="#fn-examples" id="fnref-examples" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-examples">
<p>Foos and bars.<a href="#fnref-examples" class="footnote-backref" data-footnote-backref aria-label="Back to content">↩</a></p>
</li>
</ol>
</section>

Note the attributes prefixed with data-.

While this syntax is perfectly valid in HTML5, the result is not a valid XHTML fragment. Could you fix that, so that the HTML renderer would produce such attributes with the explicit empty values instead (the proposed patch is attached).

In this case the above example would become a valid XHTML fragment and look as follows:

<p>Foo bar.<sup class="footnote-ref"><a href="#fn-examples" id="fnref-examples" data-footnote-ref="">1</a></sup></p>
<section class="footnotes" data-footnotes="">
<ol>
<li id="fn-examples">
<p>Foos and bars.<a href="#fnref-examples" class="footnote-backref" data-footnote-backref="" aria-label="Back to content">↩</a></p>
</li>
</ol>
</section>

Let me also give you some background on our use case, so you can see how this can be useful.

In build2 toolchain project we embed third party package descriptions, potentially written in GFM, into the repository WEB pages. We also need to truncate these descriptions, so that they do not exceed some reasonable number of displayed characters, preserving the markup. To achieve that, we use the libcmark-gfm API to convert these descriptions into HTML. Then we parse the resulting HTML using the XML parser until the content limit is reached and all the opened elements are closed. After that we serialize the truncated XHTML fragment as part of the repository WEB page.

fix-html-empty-attributes.patch.gz