Non isomorphic parsing/formatting for bold/italic with spaces
SamyPesse opened this issue · 7 comments
Initial checklist
- I read the support docs
- I read the contributing guide
- I agree to follow the code of conduct
- I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
remark-parse@10.0.0
Link to runnable example
https://codesandbox.io/s/cocky-meitner-88li6
Steps to reproduce
To reproduce, parse the following markdown:
**Our **_**developer**_** guides** and APIs have a home of their own now.
Expected behavior
This markdown snippet works on GitHub:
Our developer guides and APIs have a home of their own now.
Actual behavior
The markdown snipped is being reprocessed at:
**Our **\_**developer**\_\*\* guides\*\* and APIs have a home of their own now.
Runtime
Node v14
Package manager
yarn v2
OS
Linux, macOS
Build and bundle tools
esbuild
To provide a bit more context, in our application users can select text which leading/trailing spaces and format it as bold/italic, basically something like:
hello<bold> world </bold>!
It was leading to issues when generating markdown with remark, because the following is not a valid markdown:
hello** world **!
So we implemented a custom logic to trim the inner content and move the spaces outside the bold/italic and other marks. But it can lead to more complex tree and remark generated the following markdown:
**Our **_**developer**_** guides** and APIs have a home of their own now.
that it can't parse after.
I'm seeing 2 issues:
remark
should probably trim the inner content of bold/italic/code to avoid generating invalid markup(ex it should generate**world**
instead of** world **
.remark
cannot parse this markdown that works on GitHub
Likely related to syntax-tree/mdast-util-to-markdown#12
remark should probably trim the inner content of bold/italic/code to avoid generating invalid markup(ex it should generate
**world**
instead of** world **
.
I dunno on the first point. Your code here is generating an object model that is impossible to make with markdown syntax. Take the DOM:
p = document.createElement('p')
h1 = document.createElement('h1')
h1.textContent = 'Hi!'
p.append(h1)
p.outerHTML // "<p><h1>Hi!</h1></p>"
d = document.createElement('div')
d.innerHTML = p.outerHTML;
d.outerHTML // "<div><p></p><h1>Hi!</h1><p></p></div>"
Especially with a vague language like markdown, I think there will always be cases that can easily be represented by JSON but are impossible to serialize/parse.
If you’re generating **Our **_**developer**_** guides**
, why not generate **Our _developer_ guides**
instead?
remark cannot parse this markdown that works on GitHub
Sure! Minimal repro: *a *__*b*__* c*
Especially with a vague language like markdown, I think there will always be cases that can easily be represented by JSON but are impossible to serialize/parse.
Yes, I was wondering if the case of trimming spaces in bold/italic should be something handled by remark or not. Maybe it's something we can implement as a plugin, similar to the rehype-minify-whitespace
.
Because I can imagine the confusion when the following tree generates an invalid markdown:
{
type: 'paragraph',
children: [
{
type: 'strong',
children: [
{
type: 'text',
value: 'Hello ',
},
],
}
]
}
If you’re generating Our developer guides, why not generate Our developer guides instead?
Yes, I'm looking at improving this on our side in our step which is going from our AST into the remark AST.
What do you care most about? That it’s readable markdown? Or that it works?
Because readable would always have such problems (also in Chinese and other languages).
There might be something to be done in CommonMark, e.g., <-** a **->
or so might be possible (although this looks horrible). A character to force them to open or close even when they currently can’t.
And a plugin as you mention might indeed be useful to a lot of folks.
Alternatively, inject HTML instead. <b>
, <i>
and such?
I came up with a way to do it, I think: syntax-tree/unist#60 (comment).