tree-sitter/py-tree-sitter

UTF-16 encoding support is wanted

CallmeNezha opened this issue · 1 comments

I'm using py-tree-sitter in my PyQt written text editor, and since Qt's QPlainTextEdit QTextDocument using UTF-16, I was wondering if you could add UTF-16 encoding support to this Python binding. Because tree-sitter supports UTF-16 and I see in binging.c it has UTF-8 as default input. Now I have to convert Qt's string to UTF-8 and pass it to py-tree-sitter's parser, and I have to calculate the correct position and byte position from the returned value and map it to the QTextDocument's cursor position. This conversion is complex and error-prone.

PR welcome.

This should be replaced with ts_parser_parse_string_encoding based on the encoding of the buffer:

new_tree = ts_parser_parse_string(self->parser, old_tree, source_bytes, length);

I don't know if the callable version has any way of finding the encoding:

.encoding = TSInputEncodingUTF8,