Mistakenly categorises :email: as SymbolNode + WordNode
Closed this issue · 1 comments
Hello! Thank you for all of your work on this package.
A short example in v4.3.0.
var inspect = require('unist-util-inspect')
var Latin = require('parse-latin')
var tree = new Latin().parse("You've got \u2709\uFE0F!")
// "You've got ✉️!"
console.log(inspect(tree))
RootNode[1] (1:1-1:15, 0-14)
└─ ParagraphNode[1] (1:1-1:15, 0-14)
└─ SentenceNode[7] (1:1-1:15, 0-14)
├─ WordNode[3] (1:1-1:7, 0-6)
│ ├─ TextNode: "You" (1:1-1:4, 0-3)
│ ├─ PunctuationNode: "'" (1:4-1:5, 3-4)
│ └─ TextNode: "ve" (1:5-1:7, 4-6)
├─ WhiteSpaceNode: " " (1:7-1:8, 6-7)
├─ WordNode[1] (1:8-1:11, 7-10)
│ └─ TextNode: "got" (1:8-1:11, 7-10)
├─ WhiteSpaceNode: " " (1:11-1:12, 10-11)
├─ SymbolNode: "✉" (1:12-1:13, 11-12) <------- This is a U+2709
├─ WordNode[1] (1:13-1:14, 12-13) <------- 😢
│ └─ TextNode: "️" (1:13-1:14, 12-13) <------- This is a U+FE0F
└─ PunctuationNode: "!" (1:14-1:15, 13-14)
I've traced this down from a bug I was experiencing in https://github.com/tbroadley/spellchecker-cli when I spellcheck markdown that uses the :email:
shortcode. It is flagged as a spelling mistake, due to this extra U+FE0F
. Some other emoji are affected, ones that are based on older symbols, such as ✂️ and
I had a bit of a go at fixing this but didn't get very far. I would be very grateful to if you could point me in the right direction so I can submit a PR, though if you would prefer to handle yourself I will be equally grateful!
Edit: In particular, I got stuck trying to figure out which, if any, of the modules in lib/plugin
ought to be amended to correct this behaviour.
That project uses retext to wrap this project. retext can use one of its plugins to add support for emoji (https://github.com/retextjs/retext-emoji)!