Semantic token may return invalid offset

Question

Semantic token may return invalid offset

Opened this issue 8 days ago · 1 comments

I'm opening this issue in order to track / remember about the problem, but it may already be fixed, or not exist at all. I'll try to find time to investigate later, unless this rings a bell to someone else immediately.

I've enabled semantic token a while ago (And I actually kinda like it, but what I really love is that it highlight the link inside haddock differently and as a result I do have visual feedback if my haddock link are broken ;)

Since I've enabled it, my editor (neovim) is sometime stuck for minutes during edition. It is difficult to reproduce, because if I undo the latest action and redo it, the hang does not happen anymore and it seems to only happen after a few time editing (or multiple files opened, or really, I don't know the root cause), because I had never been able to perfectly reproduce the hang by opening a file and doing the same insert action.

Note that when editor was hanging, hls was idle when actually, neovim was eating 100% cpu. Using a bit of profiling, I tracked the offending function from inside the semantic highlight logic of neovim. But because it was really difficult to reproduce and understand, I did not reported anything.

Recently I found neovim/neovim#36257 which displays a similar issue, hang of neovim when semantic token is highlighted and the author showed that the problem was linked to invalid offset returned by the lsp semantiic token queries. (In there context, it was something typescript related).

I've applied the neovim patch and I hope that these problem will be gone for me starting now. On the other hand, the neovim patch is a "defensive" programming logic: it just ignore when offsets are out of the file. It still means that the lsp can send invalid offsets, so there is definitely something which can be improved on HLS side.

Ping @soulomoon, as the semantic token wizard ;)

(I've patched my neovim so it logs when an invalid value is produced my HLS, so I'll hope to be soon back with more data, but I did not wanted this information to be lost)

Your environment

Which OS do you use?
linux
Which version of GHC do you use and how did you install it?
GHC 9.12 on work codebase, I also observed the same with other versions of GHC (recently 9.10, and 9.8 when working on hls codebase itself)
How is your project built (alternative: link to the project)?
cabal or nix

Which LSP client (editor/plugin) do you use?
neovim native lsp client
Which version of HLS do you use and how did you install it?
2.11.0.0 from nixpkgs, 2.10.0.0 manually built
Have you configured HLS in any way (especially: a hie.yaml file)?

Steps to reproduce

No yet known

Expected behaviour

semantic token should not emit invalid offset

Actual behaviour

Not confirmed, but semantic token may be generating invalid offsets

Debug information

Answer 1 · 2025-11-02T04:41:53.000Z

Thanks for the investigation @guibou .

I suspect this is because semantic tokens depend on position mappings if the file failed to typecheck for the moment.
The file change notifications must precisely match the actual file edits; otherwise, the token ranges can become inconsistent.

For example, if the file is edited externally and the LSP server doesn’t receive a corresponding didChange notification, the position data will go out of sync, breaking the mapping between tokens and source code.

Position mappings are necessary because semantic tokens are based on the last successfully typechecked version of the module.
When the file changes, we need to apply these mappings to translate the old token positions to the new ones, ensuring that the highlights still align with the current text.