`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well)

Question

`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well)

Closed this issue 6 months ago · 0 comments

As an optimization, I feel that pygls should choose the utf-32 encoding if the editor prefers it over utf-16.

Looking a bit at the code, it looks like _with_position_encodings:

    def _with_position_encodings(self):
        self.server_cap.position_encoding = types.PositionEncodingKind.Utf16

        general = self.client_capabilities.general
        if general is None:
            return self

        encodings = general.position_encodings
        if encodings is None:
            return self

        if types.PositionEncodingKind.Utf16 in encodings:
            return self

        if types.PositionEncodingKind.Utf32 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf32
            return self

        if types.PositionEncodingKind.Utf8 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf8
            return self

        logger.warning(f"Unknown `PositionEncoding`s: {encodings}")

        return self

The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.

As an example, the LSP client eglot (from emacs) has the following encoding order: position_encodings=['utf-32', 'utf-8', 'utf-16']). Yet, the resulting encoding chosen by pygls ends up being utf-16.