Handling emoji strings in `TextNode.cut()`
Closed this issue · 1 comments
alexanderjulmer commented
I might have found some unintended behaviour in TextNode.cut()
.
When the TextNode
's text ends in some emoji, cut
seems to return odd results. Here's an example:
import prosemirror
from prosemirror.utils import text_length
import codecs
schema = prosemirror.Schema(
spec={
"nodes": {
"doc": {"content": "inline*"},
"text": {"group": "inline"},
},
}
)
emoji_string = "Text with emoji 🫵" # 17 characters, emoji is single character
text_node = schema.text(emoji_string)
text_node.cut(0, 17) # raises UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 32-33: unexpected end of data
This behaviour seems to arise, because node_before
computes the length of the text node differently than text_length
.
I am still trying to investigate this further.
alexanderjulmer commented
Sorry folks, I wrongly traced the error to his library.