Neovim tree sitter support
Closed this issue · 20 comments
Hi, Neovim has tree-sitter syntax tree based highlighting as an alternative to regex based highlighting. Enabling it disables the regex based highlighting. You can enable both, but that results in clashing colors from both highlighters. That means vim-css-color's highlighting is also disabled.
So my question is: Would it be possibly to add tree-sitter support for vim-css-color and if that's not feasible, does anyone know of a tree-sitter compatible plugin that does the same?
An alternative that I have considered would be to disable all but vim-css-color regex highlighting and then enable both tree-sitter and the regex highlighting.
As for supporting tree-sitter, here is what I know: One point where one can hook in is in the queries on the already existing syntax tree for a language in a file. That would likely go in the after/queries
dir and consist of Scheme files. Though I am not sure if those queries support regex for one and I imagine it would be difficult to do without hard coding colors. (Which would be kinda insane.)
The only other way I could see is to add those things directly to the parser for a given language, but that would then be language specific.
So right now I don't see an easy way to implement this myself.
I don’t use NeoVim myself (so e.g. this wasn’t on my radar) but my basic stance is that I’d certainly be interested in supporting it – as long as it’s not at the cost of Vim support, i.e. it must be possible to support both at the same time. Whether that can be made to work, I have no idea so far (as I’m only just hearing about it and haven’t even googled it yet — too busy to look into it right now, but wanted to give a quick statement), but I’m (blindly) assuming it will be. I’ll be taking a look when I can.
Actually thinking more about it, I think it is possible to do. Syntax highlighting is just one of the things plugins do with the generated syntax tree. There's also things like doing indentation based on it, a scope preview, folding... So there is most likely a programmatic interface that could be used to find strings of interest and then just treat them as with the regex highlighting, probably independent of language as well.
Oh that part is not in question at all. What I don’t yet strictly know is whether it’s feasible to support both syntax highlighting interfaces in the same codebase (though I’d be surprised if the answer is no).
No promises, but if I find the time I can also look further into it and maybe put in a PR. I at least did some research into tree-sitter when making my own theme compatible with it.
Hmm. Taking another look now (with @mikehaertl’s issue as the impetus) I guess the question is, if you have a tree-sitter parser for a language, how do you add bits to it in defined places?
Vim’s syntax engine is designed to allow syntaxes to be both subsumed and extended, and CSS Color makes use of that to hook its color name pattern into specific bits of the syntax rules, so that color names will be recognised (roughly) inside string literals and comments, and not elsewhere. And what’s added there is a dummy highlight group which serves as a hook only; the ability to extent the syntax rules is used again to add a rule for each color to that highlight group. So the ability to change the parse rules dynamically and to specify how to extend them is crucial.
With tree-sitter, how would one do that? So far I’m finding a lot of marketing material and various plugins and syntaxes but not a lot of documentation of the interfaces by which it all hangs together. Is there any top-to-bottom documentation remotely like :help syntax.txt
that explains the whole system? Or even just a rough draft of something like that.
I know that you can inject languages in other languages with tree sitter as well. So I imagine one would have to go to the maintainers of the separate language parsers if there isn't some way to add it after the fact. The only documentation I've used is this one for Emacs, which describes writing queries. https://emacs-tree-sitter.github.io/syntax-highlighting/queries/
Those are the same as you would use in Vim. You can also look at what I wrote in my own config so far: https://github.com/kmoschcau/nvim-config/tree/main/after/queries
Other than that, basically this is the official documentation. https://tree-sitter.github.io/tree-sitter/
I have also used this extensively when writing my queries: https://github.com/nvim-treesitter/playground
It helps with understanding what the syntax tree looks like and might be a good point to look into how tree sitter modules interact with it.
@ap I can only give you a rough summary as I went from "0 knowledge" to being able to contribute a new language to the nvim-treesitter
plugin over the last couple of days.
Treesitter at it's core is a bunch of fast parsers written in C. There's a parser for every supported language. They all produce a tree structure as output that is similar to an AST. This is much different from a pure regex base matcher as such a tree allows to understand what a specific piece of code is.
Here's an example for how such a tree structure looks like:
The tree also contains information about the exact location where in the text a node starts and ends (here in brackets).
This tree structure can be traversed and queried just like e.g. the DOM. A query could for example ask for all nodes of type (name)
that are child of (variable_name)
.
These queries are used to create the actual highlighting.
The plugin has a predefined set of highlight groups. They are the same for every language:
https://github.com/nvim-treesitter/nvim-treesitter/blob/04a48e317e7ae39decf67ecc7fb2c9eacf2a2ad0/doc/nvim-treesitter.txt#L482
For example TSComment
is the highlight group used for comments - no matter which language.
For each language there's a set of highlight queries that selects nodes from the tree and maps them to one these predefined highlight groups. (Actually the queries create a so called "capture group" like @comment
which again is then used to assign the corresponding highlight group TSComment
).
Now to this issue here:
If I understand this right, there's a after/syntax
file for every supported filetype here. It loads after a hl was applied and adds color highlighting on top. It uses the language specific highlight groups to find strings, comments, etc.
This should be much easier with nvim-treesitter as there is only a predefined set of common highlight groups. I tried by simply adding e.g. TSString
or TSText
to after/syntax/php.vim
here - but it did not work.
But - I have no idea how they override the default syntax highlighting. It must be some LUA trickery but I'm not yet really familiar with LUA. So I couldn't figure out how it's done and why after/syntax
does not work anymore.
I know it's not specific to Lua, you can also create those highlight regions in a buffer with VimL, they also have a specific name that I currently can't recall. There are a couple of functions in Neovim's VimL API, I just can't remember the name. Something like match group or similar. It works a bit like the :match
command, only it can take positions in a buffer instead of a regex. Also it has the advantage of being able to layer them. For example if you put a foreground highlight on top of a background highlight, the background highlight isn't replaced.
I asked the neovim-treesitter devs about the core mechanism.
which nvim features are used to let it do what it does?
set syntax off
it's even supported on Vim. The highlights are set using extmarks
The extmarks API is documented here: https://neovim.io/doc/user/api.html#extmarks. The docs are not really extensive though and it's unclear to me how this is used to set highlighting. I'm also unsure if it's even possible to add custom highlighting on top somehow.
A workaround is to set additional_vim_regex_highlighting = true
in the nvim-treesitter config. But this can have unwanted side effects. It's also ineffective to have 2 highlight systems work in parallel.
More findings:
- nvim-treesitter uses the builtin treesitter api (https://github.com/nvim-treesitter/nvim-treesitter/blob/master/lua/nvim-treesitter/highlight.lua#L4)
- The builtin treesitter module can mostly take care of the highlighting automatically. It must be fed with the TS queries and the corresponding hl groups
- Internally it uses nvim_buf_set_extmark() to apply the highlights.
- These extmarks can receive an optional highlight priority in case several hl are applied to the same position
Here's an interesting detail:
Tree-sitter uses |nvim_buf_set_extmark()| to set highlights with a default
priority of 100. This enables plugins to set a highlighting priority lower or
higher than tree-sitter.
In summary what we could do:
- After TS initialized we could query all nodes relevant for color codes (
TSString
,TSComment
, ...) and scan them, to find the positions of color codes to highlight - Set custom extmarks on the found positions with
nvim_buf_set_extmark()
with the respective hl group and a higher priority than treesitter
I'm not sure if this is really feasible in this plugin. As this is a totally different concept it's probably best to create a new plugin for this.
Extmarks was the thing I couldn't remember! I also know that the Omnisharp plugin uses this to add syntax highlights based on symbol type, by actually understanding what each symbol holds.
But yes overall I agree it might be better to write this as a separate plugin, maybe as a tree sitter module instead of putting it in here. The logic could probably be shared, but that's about it.
@kmoschcau this one is working https://github.com/RRethy/vim-hexokinase
I use norcalli/nvim-colorizer.lua. It's zero dependencies! (while RRethy/vim-hexokinase requires golang)
I use norcalli/nvim-colorizer.lua. It's zero dependencies! (while RRethy/vim-hexokinase requires golang)
Does this work together with tree-sitter highlighting?
Yes in my case :)
I just tried it, it works like a charm. Thanks for the tip! @ap I think we can close this now unless you want to try your hands at getting this into vim-css-color.
I’m not sure I follow what the conclusion here is… (perhaps because I don’t use NeoVim enough and haven’t yet checked out these plugins yet…) can you summarise?
The solution was I switched to the nvim-colorizer.lua plugin that @hankchiutw linked. So basically from my perspective this issue can be closed since there is a workaround (a different plugin). I just didn't want to simply close the issue, if you want to use it to keep track of future efforts.
I see.
Well, I’m still interested in making CSS Color integrate with NeoVim plugins, so I’m inclined to keep the issue open.
However, if you would like it closed in order to declutter your own overview of open issues, I’m OK with that; I would open another ticket for the same issue and link to this one from there, and this one can be closed.
Yeah that's fine by me as well. I'm going to close this issue then.