commonmark/cmark

Percent encoding of ~

Closed this issue · 3 comments

Currently cmark percent encodes ~ but it doesn't do for . _ -

All 4 of them are unreserved. Shouldn't ~ also not be percent encoded?

jgm commented

I don't know, this comes from houdini_href_e.c which was originally from GitHub.
It seems that ~ was required to be encoded in the past, and maybe the code is just playing it safe:
https://jkorpela.fi/tilde.html

RFC 3986 section-2.3 says,

For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.

jgm commented

I'm happy to change this if you want to submit a PR.
Probably just need to change one item in the array in houdini_href_e.c from 1 to 0.