Unicode headers produce invalid anchors
mitchelltd opened this issue · 1 comments
When rdiscount processes headers to produce anchors (for use in TOC generation) it transforms UTF-8 into ASCII-8BIT. In the process, it turns non-ASCII characters into question marks. But question marks are reserved characters in URLs.
Example :
irb(main):001:0> require 'rdiscount'
=> true
irb(main):002:0> test = "# Précis"
=> "# Précis"
irb(main):003:0> rd = RDiscount.new(test, :generate_toc)
=> #<RDiscount:0x007f92d2026630 @text="# Précis", @generate_toc=true>
irb(main):004:0> puts rd.toc_content
<ul>
<li><a href="#Pr?.cis">Précis</a></li>
</ul>
=> nil
irb(main):005:0> test.encoding
=> #<Encoding:UTF-8>
irb(main):006:0> (rd.toc_content).encoding
=> #<Encoding:ASCII-8BIT>
It is worth comparing this outcome with that of GitLab flavoured markdown, which preserves unicode characters in link IDs.