With nvim-treesitter delay in opening cmake files
cassava opened this issue · 15 comments
I am using:
- neovim 0.8
- nvim-treesitter main
- tree-sitter-cmake main
When opening any cmake file with this treesitter parse installed and highlight
enabled, I notice that there is a large delay of >1.5s before file contents are visible. Normally, a file is displayed within ~150 ms.
- If I uninstall
cmake
treesitter parser problem no longer occurs - If I disable treesitter
highlight
module the problem no long occurs - If I downgrade treesitter the problem remains
- If I downgrade
cmake
treesitter parser, the delay is less long, but still significant
can you share your nvim
config and the CMake file you're opening?
nevermind, I also notice the delay when opening a CMake file now (not as significant as 1.5s but noticable for sure). I simplied the queries and it seems to help tremendously. Now I am just waiting for upstream to accept the change.
My config is basically LazyNvim with the cmake
parser, so I was planning on creating a simplified version. On another machine of mine it's a little less noticeable.
I noticed there are some other issues that may be related:
But I haven't been able to look into it yet.
yeah, it's most likely due to the size of the queries. I am working on a simpler query version.
So how does this work? I somehow thought all the CMake stuff is here but apparently not, if you need to make a PR to nvim-treesitter. Do you have a link to documentation or care to give me the quick rundown – what's in this repo and what's in nvim-treesitter?
Ah, is it that this is just the grammar for the syntax tree, but then for actually doing anything in nvim like highlighting we have everything in nvim-treesitter.
i.e. this is the repo for the parser, and any queries are "module" specific, one of which is the highlighter, and that's specific for each editor?
i.e. this is the repo for the parser, and any queries are "module" specific, one of which is the highlighter, and that's specific for each editor?
I'm not sure that applies to other editors, but for nvim, yes, that's the gist of it.
welp, I updated the queries, but it doesn't seem to improve the startup time much. Not sure there's anything I can do more for now.
There must be some exponential parse problem in the grammar. I use this grammar for other purposes and have the following expression to match commands with two parameters (e.g. option(NAME ON/OFF)
):
(normal_command
(identifier) @command
(argument (unquoted_argument) @name)
(argument (unquoted_argument) @state)
) @whole
without the @state
some 1000 options are parsed in ~ 100ms with the @state
it takes 6000ms.
thanks for the hint. I will play around to see if i can find the problem.
I'm not sure how tree-sitter works under the hood, but now that I think about it, I am guessing that the grammar only produces the tree, and then the queries will be done on that tree. So if changing a query changes the performance, it may not be the grammar's fault, but I will investiagte some more.
Interestingly if I run the query above using tree-sitter query query.scm CMakeLists.txt
it's quick. It's just when I use the generated Rust parser API that it slows down to a crawl. I will try to investigate more as well.
Not sure if it's really the grammar or the query but given
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-n", required=True, type=int)
args = parser.parse_args()
print("list(APPEND FOO")
for i in range(args.n):
print(f"BAR{i}")
print(")")
highlights like this
(normal_command
(identifier) @command
(argument (unquoted_argument) @one)
(argument (unquoted_argument) @two)
(argument (unquoted_argument) @three)
)
with varying numbers of matched arguments and a command like
python3 gen.py -n N > CMakeLists.txt && time tree-sitter query query.scm CMakeLists.txt
one can quickly see times like
N | one arg | two args | three args |
---|---|---|---|
10 | 0.003 | 0.015 | 0.022 |
40 | 0.003 | 0.019 | 0.431 |
80 | 0.003 | 0.034 | 11.158 |
100 | 0.003 | 0.027 | |
200 | 0.003 | 0.109 | |
400 | 0.003 | 0.785 | |
800 | 0.004 | 6.153 |
But the real kicker and reason for that behaviour is that it returns a match for each and every argument, i.e. for the one arg case you see output like
CMakeLists.txt
pattern: 0
capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
capture: 1 - one, start: (0, 5), end: (0, 11), text: `APPEND`
pattern: 0
capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
capture: 1 - one, start: (0, 12), end: (0, 15), text: `FOO`
pattern: 0
capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
capture: 1 - one, start: (1, 0), end: (1, 4), text: `BAR0`
pattern: 0
capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
capture: 1 - one, start: (2, 0), end: (2, 4), text: `BAR1`
pattern: 0
capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
capture: 1 - one, start: (3, 0), end: (3, 4), text: `BAR2`
…
instead of just a single list APPEND
. It's certainly unexpected from what I can tell.
Edit: so, with that information one can rewrite the queries with the anchor .
and constrain the matches to the number of expected arguments and everything behaves as expected. Not sure, anything needs to be done here.
thank you for doing the investigation, it's very useful. For the query with 3 args that you've written, I think it's the expected behaviour for a query written like that. For the spike at the 800 statements with 2 args, I have no idea why that's the case, but 800 statements is also a big number of statements. Could you create an issue to the tree-sitter repo to ask about this problem?
For the spike at the 800 statements with 2 args, I have no idea why that's the case, but 800 statements is also a big number of statements.
We do have long lists of sources, maybe not 800 but certainly in the hundreds. I will look a bit deeper into the parser generator but yes, will open something over there if I have more understanding.