openlawlibrary/pygls

Provide helpers to convert position using `positionEncoding`

Closed this issue ยท 2 comments

For LSP response messages where we need to return ranges and position, the actual character position value (according to protocol) should use utf-16 code units (specifically UTF-16-LE/BE based on platform). The client provides an array of supported encodings for positions via client capabilities positionEncoding and the sever can select one and respond back in its server capabilities. The default in the absence of these is assumed to be UTF-16.

Example:
From:

s = '๐Ÿ˜Š'

To:

s = "๐Ÿ˜Š"

These are the different ranges for text edits based on positionEncoding
image

UTF-16 (no BOM), UTF-8, UTF-32 (no BOM) code units respectively

Reference:
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#positionEncodingKind

                /**
		 * The position encodings supported by the client. Client and server
		 * have to agree on the same position encoding to ensure that offsets
		 * (e.g. character position in a line) are interpreted the same on both
		 * side.
		 *
		 * To keep the protocol backwards compatible the following applies: if
		 * the value 'utf-16' is missing from the array of position encodings
		 * servers can assume that the client supports UTF-16. UTF-16 is
		 * therefore a mandatory encoding.
		 *
		 * If omitted it defaults to ['utf-16'].
		 *
		 * Implementation considerations: since the conversion from one encoding
		 * into another requires the content of the file / line the conversion
		 * is best done where the file is read which is usually on the server
		 * side.
		 *
		 * @since 3.17.0
		 */
		positionEncodings?: PositionEncodingKind[];

I believe we already support this? https://github.com/openlawlibrary/pygls/blob/main/pygls/workspace/position_codec.py

Unless I'm misunderstanding your description?

Yes it is. Thank you :)