`1(abc)2` is parsed incorrectly

Question

`1(abc)2` is parsed incorrectly

JounQin opened this issue 3 years ago · 2 comments

JounQin commented 3 years ago

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

remark-parse

Link to runnable example

ast explorer

Steps to reproduce

The markdown content is transformed via rehype-remark:

1<strong>(abc)</strong>2

Transformed markdown:

1**(abc)**2

parse via remark-parse again

// ast
{
  "type": "root",
  "children": [
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "1**(abc)**2",
          "position": {
            "start": {
              "line": 1,
              "column": 1,
              "offset": 0
            },
            "end": {
              "line": 1,
              "column": 12,
              "offset": 11
            }
          }
        }
      ],
      "position": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 12,
          "offset": 11
        }
      }
    }
  ],
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 2,
      "column": 1,
      "offset": 12
    }
  }
}

Transform the Markdown content into html via remark-rehype

<p>1**(abc)**2</p>

Versions:

{
    "rehype-parse": "^8.0.3",
    "rehype-remark": "^9.1.0",
    "rehype-stringify": "^9.0.2",
    "remark-parse": "^10.0.1",
    "remark-rehype": "^10.1.0"
}

Expected behavior

parse () in ** correctly

Actual behavior

**()** is parsed as text

Runtime

Node v16

Package manager

pnpm

OS

macOS

Build and bundle tools

No response

Answer 1 · 2021-12-09T11:50:15.000Z

Comparing with the commonmark reference parser https://spec.commonmark.org/dingus/?text=1**(abc)**2
I think 1**(abc)**2 is being parsed appropriately here.

The rule for opening strong text

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

And closing strong text

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

(https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis)

Don't seem to be met.

I'm not sure the HTML 1<strong>(abc)</strong>2 can be represented precisely/exactly in markdown.

A close equivalent may be adding zero width space characters between 1 and ** as well as ** and 2. 🤔

Answer 2 · 2021-12-09T15:45:37.000Z

Yep, the actual behavior is how markdown works, and the expected behavior is not, so I’ll close this as it’s not a change here.