JakeBecker/elixir-ls

Formatter should use UTF-16 string representation

miXwui opened this issue · 1 comments

There is a mismatch during the handoff of what character positions to change from elixir-ls to VSCode and vice versa (and I assume all other editors).

Some intuition:
In JS: '😀'.length === 2 (because strings are represented in UTF-16)
In Elixir: String.length("😀") === 1 (because strings are represented in UTF-8)

LSP text documents are in UTF-16 string representation:
https://microsoft.github.io/language-server-protocol/specification#textDocuments

Repro:

Paste this into a blank Elixir file:

IO.inspect "😀"

Format file (default Alt+Shift+F in VSCode or equivalent)
It will format into this (notice the parenthesis placement):

IO.inspect("😀)"

How it should format:

IO.inspect("😀")

IO.puts "🏳️‍🌈" turns into IO.puts("🏳)️‍🌈" as 🏳️‍🌈 is length 6 in UTF-16; it's splitting the grapheme.

 
 
 
Similar behavior happens for Zalgo strings like so: ẕ̸͇̞̲͇͕̹̙̄͆̇͂̏̊͒̒̈́́̕͘͠͝à̵̢̛̟̞͚̟͖̻̹̮̘͚̻͍̇͂̂̅́̎̉͗́́̃̒l̴̻̳͉̖̗͖̰̠̗̃̈́̓̓̍̅͝͝͝g̷̢͚̠̜̿̊́̋͗̔ȍ̶̹̙̅̽̌̒͌͋̓̈́͑̏͑͊͛͘ ̸̨͙̦̫̪͓̠̺̫̖͙̫̏͂̒̽́̿̂̊́͂͋͜͠͝͝ṭ̴̜͎̮͉̙͍͔̜̾͋͒̓̏̉̄͘͠͝ͅę̷̡̭̹̰̺̩̠͓͌̃̕͜͝ͅͅx̵̧͍̦͈͍̝͖͙̘͎̥͕̾̾̍̀̿̔̄̑̈͝t̸̛͇̀̕

I have a fix, will open a PR :)

Related:
#165
microsoft/language-server-protocol#376

Environment

* Elixir & Erlang versions (elixir --version): Erlang/OTP 22 [erts-10.7] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Elixir 1.10.2 (compiled with Erlang/OTP 21)

  * VSCode ElixirLS Fork version: 0.3.2
  * Operating System Version: linux 4.4.0-18362-Microsoft

Windows 10, WSL on Ubuntu 18.04.02 LTS 

Whoops, accidentally created in the wrong repo, this one's old; closing for this:
elixir-lsp/elixir-ls#199