Is the `uri` field RFC3986 compliant?
koraa opened this issue · 2 comments
Hi; thanks for the great library :)
Now to the point:
Parsing the following markdown
# Foo
Hello World [link](<foo
bar>)I get the following MDAST (represented as YAML, position info stripped):
type: root
children:
- type: heading
depth: 1
children:
- type: text
value: Foo
- type: paragraph
children:
- type: text
value: "Hello World "
- type: link
title: ~
url: "foo\nbar"
children:
- type: text
value: linkThe AST looks pretty much as expected; the newline in the link is handled by just including a newline (0x20) character in the string; which also seems alright, but caused some problems for us when using the AST, because we expected URLs to be RFC3986 compliant; RFC3986 mandates that most special characters be percent-encoded.
Is this expected behavior?
We specifically ran into this issue when using json-schema to validate our mdast; do you have any reccomendation on what the best way would be to validate whether a mdast is comliant?
The markdown you posted is valid by default, but not if CommonMark is turned on, as you can see rendered here on GitHub:
Hello World [link](<foo
bar>)↓
Hello World [link]()
...because in CommonMark, white-space cannot be in this construct (it’s called an autolink).
I suggest using CommonMark, and against using white-space in links.
We do not change URLs in mdast, so that we can also create markdown again. This is expected behaviour, so I suggest using a laxer JSON schema.
However, if we’re going to a format like HTML, I do think we should encode the URLs. If you’re doing something like that and it doesn’t work, please let us know.
Ok! Thanks for the clarification!