Handling emoji strings in `TextNode.cut()`

Question

Handling emoji strings in `TextNode.cut()`

Closed this issue 8 months ago · 1 comments

I might have found some unintended behaviour in TextNode.cut().

When the TextNode's text ends in some emoji, cut seems to return odd results. Here's an example:

import prosemirror
from prosemirror.utils import text_length
import codecs

schema = prosemirror.Schema(
    spec={
        "nodes": {
            "doc": {"content": "inline*"},
            "text": {"group": "inline"},
        },
    }
)

emoji_string = "Text with emoji 🫵"  # 17 characters, emoji is single character

text_node = schema.text(emoji_string)
text_node.cut(0, 17)    # raises UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 32-33: unexpected end of data

This behaviour seems to arise, because node_before computes the length of the text node differently than text_length.

I am still trying to investigate this further.

Answer 1 · 2024-05-21T14:23:16.000Z

Sorry folks, I wrongly traced the error to his library.