`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well)
Closed this issue · 0 comments
As an optimization, I feel that pygls
should choose the utf-32
encoding if the editor prefers it over utf-16
.
Looking a bit at the code, it looks like _with_position_encodings
:
def _with_position_encodings(self):
self.server_cap.position_encoding = types.PositionEncodingKind.Utf16
general = self.client_capabilities.general
if general is None:
return self
encodings = general.position_encodings
if encodings is None:
return self
if types.PositionEncodingKind.Utf16 in encodings:
return self
if types.PositionEncodingKind.Utf32 in encodings:
self.server_cap.position_encoding = types.PositionEncodingKind.Utf32
return self
if types.PositionEncodingKind.Utf8 in encodings:
self.server_cap.position_encoding = types.PositionEncodingKind.Utf8
return self
logger.warning(f"Unknown `PositionEncoding`s: {encodings}")
return self
The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.
As an example, the LSP client eglot
(from emacs
) has the following encoding order: position_encodings=['utf-32', 'utf-8', 'utf-16'])
. Yet, the resulting encoding chosen by pygls
ends up being utf-16
.