`1**(abc)**2` is parsed incorrectly
JounQin opened this issue · 2 comments
Initial checklist
- I read the support docs
- I read the contributing guide
- I agree to follow the code of conduct
- I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
remark-parse
Link to runnable example
Steps to reproduce
The markdown content is transformed via rehype-remark
:
1<strong>(abc)</strong>2
Transformed markdown:
1**(abc)**2
parse via remark-parse
again
// ast
{
"type": "root",
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "1**(abc)**2",
"position": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 1,
"column": 12,
"offset": 11
}
}
}
],
"position": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 1,
"column": 12,
"offset": 11
}
}
}
],
"position": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 2,
"column": 1,
"offset": 12
}
}
}
Transform the Markdown content into html via remark-rehype
<p>1**(abc)**2</p>
Versions:
{
"rehype-parse": "^8.0.3",
"rehype-remark": "^9.1.0",
"rehype-stringify": "^9.0.2",
"remark-parse": "^10.0.1",
"remark-rehype": "^10.1.0"
}
Expected behavior
parse ()
in **
correctly
Actual behavior
**()**
is parsed as text
Runtime
Node v16
Package manager
pnpm
OS
macOS
Build and bundle tools
No response
Comparing with the commonmark reference parser https://spec.commonmark.org/dingus/?text=1**(abc)**2
I think 1**(abc)**2
is being parsed appropriately here.
The rule for opening strong text
A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
And closing strong text
A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
(https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis)
Don't seem to be met.
I'm not sure the HTML 1<strong>(abc)</strong>2
can be represented precisely/exactly in markdown.
A close equivalent may be adding zero width space characters between 1
and **
as well as **
and 2
. 🤔
Yep, the actual behavior is how markdown works, and the expected behavior is not, so I’ll close this as it’s not a change here.