Formatter should use UTF-16 string representation
miXwui opened this issue · 1 comments
There is a mismatch during the handoff of what character positions to change from elixir-ls to VSCode and vice versa (and I assume all other editors).
Some intuition:
In JS: '😀'.length === 2
(because strings are represented in UTF-16)
In Elixir: String.length("😀") === 1
(because strings are represented in UTF-8)
LSP text documents are in UTF-16 string representation:
https://microsoft.github.io/language-server-protocol/specification#textDocuments
Repro:
Paste this into a blank Elixir file:
IO.inspect "😀"
Format file (default Alt+Shift+F in VSCode or equivalent)
It will format into this (notice the parenthesis placement):
IO.inspect("😀)"
How it should format:
IO.inspect("😀")
IO.puts "🏳️🌈"
turns into IO.puts("🏳)️🌈"
as 🏳️🌈
is length 6 in UTF-16; it's splitting the grapheme.
Similar behavior happens for Zalgo strings like so: ẕ̸͇̞̲͇͕̹̙̄͆̇͂̏̊͒̒̈́́̕͘͠͝à̵̢̛̟̞͚̟͖̻̹̮̘͚̻͍̇͂̂̅́̎̉͗́́̃̒l̴̻̳͉̖̗͖̰̠̗̃̈́̓̓̍̅͝͝͝g̷̢͚̠̜̿̊́̋͗̔ȍ̶̹̙̅̽̌̒͌͋̓̈́͑̏͑͊͛͘ ̸̨͙̦̫̪͓̠̺̫̖͙̫̏͂̒̽́̿̂̊́͂͋͜͠͝͝ṭ̴̜͎̮͉̙͍͔̜̾͋͒̓̏̉̄͘͠͝ͅę̷̡̭̹̰̺̩̠͓͌̃̕͜͝ͅͅx̵̧͍̦͈͍̝͖͙̘͎̥͕̾̾̍̀̿̔̄̑̈͝t̸̛͇̀̕
I have a fix, will open a PR :)
Related:
#165
microsoft/language-server-protocol#376
Environment
* Elixir & Erlang versions (elixir --version): Erlang/OTP 22 [erts-10.7] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]
Elixir 1.10.2 (compiled with Erlang/OTP 21)
* VSCode ElixirLS Fork version: 0.3.2
* Operating System Version: linux 4.4.0-18362-Microsoft
Windows 10, WSL on Ubuntu 18.04.02 LTS
Whoops, accidentally created in the wrong repo, this one's old; closing for this:
elixir-lsp/elixir-ls#199