performance problems in Go code with large number of tree-sitter grammars installed
Closed this issue · 9 comments
Description of Issue
NVIM v0.9.0
Build type: Release
LuaJIT 2.1.0-beta3
system vimrc file: "$VIM/sysinit.vim"
fall-back for $VIM: "/nix/store/h58i7gppfamlnadkwzj35wxc9sklpdsx-neovim-unwrapped-0.9.0/share/nvim"
Scrolling through Go code is fairly quick, but editing results in a pretty lengthy pause. Here's the MatchupShowTimes
output after just entering Insert mode in src/fmt/print.go
(source):
average last maximum
[get_matching_delims]
initial_pair 0.44ms 0.27ms 2.82ms
get_matches 0.49ms 0.32ms 2.87ms
[loader_init_buffer]
done 15.58ms 9.15ms 25.14ms
[loading]
init_done 18.49ms 18.49ms 18.49ms
[loading_module]
misc 0.01ms 0.01ms 0.01ms
motion 0.05ms 0.03ms 0.05ms
matchparen 8.26ms 8.26ms 8.26ms
loader 9.60ms 9.60ms 9.60ms
[matchparen.highlight]
get_matching 5.66ms 1.49ms 20.87ms
end 5.82ms 1.66ms 21.05ms
get_current 803.07ms 1632.90ms 2243.81ms
[s:get_delim]
first_pass 0.14ms 0.12ms 0.28ms
nothing_found 0.16ms 0.13ms 0.32ms
got_results 4.80ms 0.77ms 20.25ms
[ts_engine.get_delim]
fail 802.71ms 1632.50ms 2243.52ms
And here is the result of a func profile:
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
53 4.090691 0.007120 4()
48 4.061295 0.000661 matchup#delim#get_current()
48 4.060634 0.001619 <SNR>80_get_delim_multi()
48 4.021120 0.001011 matchup#ts_engine#get_delim()
48 4.019052 <SNR>76_forward()
51 1.451991 0.000517 3()
48 0.037895 0.008010 <SNR>80_get_delim()
16 0.027112 nvim_treesitter#foldexpr()
60 0.024977 0.001351 matchup#ts_syntax#skip_expr()
120 0.022805 <SNR>83_forward()
36 0.021639 0.000912 matchup#delim#skip()
60 0.013530 0.000270 matchup#ts_syntax#lang_skip()
18 0.013403 0.002397 matchup#delim#get_matching()
18 0.010411 0.003533 <SNR>80_get_matching_delims()
60 0.010097 0.000551 matchup#ts_syntax#synID()
1 0.006348 nvim_treesitter#indent()
18 0.005113 0.004808 <SNR>80_parser_delim_new()
24 0.004918 0.000443 matchup#delim#skip0()
264 0.004125 matchup#perf#toc()
240 0.003169 0.002059 <SNR>79_parse_args()
FUNCTIONS SORTED ON SELF TIME
count total (s) self (s) function
48 4.019052 <SNR>76_forward()
16 0.027112 nvim_treesitter#foldexpr()
120 0.022805 <SNR>83_forward()
48 0.037895 0.008010 <SNR>80_get_delim()
53 4.090691 0.007120 4()
1 0.006348 nvim_treesitter#indent()
18 0.005113 0.004808 <SNR>80_parser_delim_new()
264 0.004125 matchup#perf#toc()
18 0.010411 0.003533 <SNR>80_get_matching_delims()
18 0.013403 0.002397 matchup#delim#get_matching()
240 0.003169 0.002059 <SNR>79_parse_args()
18 0.002233 0.002042 <SNR>58_add_matches()
48 4.060634 0.001619 <SNR>80_get_delim_multi()
48 0.001703 0.001611 1()
60 0.024977 0.001351 matchup#ts_syntax#skip_expr()
48 0.001118 matchup#loader#refresh_match_words()
48 4.021120 0.001011 matchup#ts_engine#get_delim()
36 0.021639 0.000912 matchup#delim#skip()
54 0.000710 matchup#delim#fill_backrefs()
48 4.061295 0.000661 matchup#delim#get_current()
I've narrowed this down to 2dffaed and suspect the issue might be that I have a large number of grammars installed (150+). Performance is much better (but still noticeably stutters) if I drop installed grammars to < 20. Prior to 2dffaed, performance does not seem to be impacted much by number of installed grammars.
Minimal vimrc file
I am using home-manager to generate my nvim configurations and install plugins, so the setup is a little odd.
Plugins are loaded via a wrapped nvim
command:
exec -a "$0" "/nix/store/h58i7gppfamlnadkwzj35wxc9sklpdsx-neovim-unwrapped-0.9.0/bin/nvim" --cmd "set packpath^=/nix/store/dg0ylyiz2fypapk530gly943l1wvf8h4-vim-pack-dir" --cmd "set rtp^=/nix/store/dg0ylyiz2fypapk530gly943l1wvf8h4-vim-pack-dir" "$@"
packpath
here contains only nvim-treesitter
, vim-matchup
, and directories for each of the grammars I have installed (a lot of them).
There is no additional configuration besides this:
-- init.lua
require 'nvim-treesitter.configs'.setup {
ensure_installed = {},
-- andymass/vim-matchup/
matchup = {
enable = true,
},
}
Hmm I don't really see why number of grammars would impact it, my expectation (untested) is for_each_tree would just return 1 tree for a document which doesn't have embedded sub-grammars.
I see a possible issue in 2dffaed though, get_matches was cached whereas collect_group_results does not seem to be.
I added memoization to get_matches, would you be able to try it again?
Tried it just now with all grammars installed:
average last maximum
[get_matching_delims]
initial_pair 0.67ms 0.24ms 2.80ms
get_matches 0.72ms 0.28ms 2.85ms
[loader_init_buffer]
done 15.87ms 9.52ms 25.56ms
[loading]
init_done 19.48ms 19.48ms 19.48ms
[loading_module]
misc 0.01ms 0.01ms 0.01ms
motion 0.05ms 0.03ms 0.05ms
matchparen 8.95ms 8.95ms 8.95ms
loader 9.85ms 9.85ms 9.85ms
[matchparen.highlight]
get_matching 3.04ms 1.22ms 15.02ms
end 3.20ms 1.37ms 15.19ms
get_current 769.76ms 1772.04ms 2219.16ms
[s:get_delim]
first_pass 0.30ms 0.11ms 1.11ms
nothing_found 0.32ms 0.12ms 1.15ms
got_results 1.94ms 0.62ms 14.43ms
[ts_engine.get_delim]
fail 769.20ms 1771.76ms 2218.78ms
It's better than what I first reported, but still a big delay, and much slower than when I pare the grammars down to just this set:
bash,dockerfile,go,java,jq,json,lua,make,nix,python,regex,rust,toml,yaml,zig,javascript,ruby,typescript,vim,vimdoc
average last maximum
[get_matching_delims]
initial_pair 0.36ms 0.21ms 2.39ms
get_matches 0.41ms 0.25ms 2.44ms
[loader_init_buffer]
done 6.43ms 0.54ms 12.20ms
[loading]
init_done 7.62ms 7.62ms 7.62ms
[loading_module]
misc 0.01ms 0.01ms 0.01ms
motion 0.05ms 0.04ms 0.06ms
matchparen 2.38ms 2.38ms 2.38ms
loader 4.44ms 4.44ms 4.44ms
[matchparen.highlight]
get_matching 3.45ms 1.11ms 11.27ms
end 3.62ms 1.26ms 11.45ms
get_current 7.87ms 1.41ms 27.46ms
[s:get_delim]
got_results 2.63ms 0.52ms 10.75ms
first_pass 0.24ms 0.72ms 1.04ms
nothing_found 0.26ms 0.76ms 1.17ms
[ts_engine.get_delim]
fail 7.43ms 0.31ms 24.89ms
Is there more information I can provide to help? I can attach a full func profile if you'd like.
Is there more information I can provide to help? I can attach a full func profile if you'd like.
Not sure why I asked first, it was 1 minute of effort. Here's a profile with all plugins disabled except for vim-matchup
, nvim-treesitter
, and nvim-treesitter-textobjects
.
profile-all-grammars.log
I got this by running:
:e ./src/fmt/print.go
:profile start profile-all-grammars.log
:profile func *
:profile file *
64j
O
asdfasdfasdf <esc>
:profile pause
:q
Do you have a list of TS grammars included in "all grammars?"
I'm thinking that I can use bisection strategy to identify whether there is one grammar that is causing the issue versus a slow increase for each grammar leading to the bad performance.
That's a good idea. Here's the list:
ada,agda,arduino,astro,awk,bash,bass,beancount,bibtex,bicep,blueprint,c,c_sharp,capnp,chatito,clojure,cmake,comment,commonlisp,cooklang,cpon,cpp,css,cuda,cue,d,dart,devicetree,dhall,diff,dockerfile,dot,ebnf,eex,elixir,elm,elsa,elvish,embedded_template,erlang,fennel,firrtl,fish,foam,fortran,fsh,func,fusion,gdscript,git_config,git_rebase,gitattributes,gitcommit,gitignore,gleam,glimmer,glsl,go,godot_resource,gomod,gosum,gowork,graphql,hack,hare,haskell,hcl,heex,hjson,hlsl,hocon,html,htmldjango,http,ini,java,javascript,jq,jsdoc,json,json5,jsonc,jsonnet,julia,kdl,kotlin,lalrpop,latex,ledger,llvm,lua,luadoc,luap,m68k,make,markdown,markdown_inline,matlab,menhir,mermaid,meson,mlir,nickel,ninja,nix,norg,ocaml,ocaml_interface,ocamllex,org,pascal,passwd,perl,php,phpdoc,pioasm,po,poe_filter,pony,prisma,proto,prql,pug,python,ql,qmldir,qmljs,query,r,racket,rasi,regex,rego,rnoweb,ron,rst,ruby,rust,scala,scheme,scss,slint,smali,smithy,solidity,sparql,sql,squirrel,starlark,supercollider,surface,svelte,swift,sxhkdrc,t32,tablegen,teal,terraform,thrift,tiger,tlaplus,todotxt,toml,tsx,turtle,twig,typescript,ungrammar,uxntal,v,vala,verilog,vhs,vim,vimdoc,vue,wgsl,wgsl_bevy,yaml,yang,yuck,zig
I believe it's generated with this list. It's definitely overkill; I haven't even heard of most of the things in this list.
I'll also try to binary search the list later this evening when I'm done with work stuff in case you don't get to it by then. Switching up configs/grammars is pretty quick for me so I'm happy to help with any other tests you can think of.
Ok, I ran a bunch of tests. Basic methodology was keep the grammar for go
turned on, but comment out halves of the remaining 180ish grammars and try to repro.
At first, the numbers did seem to trend downward as I removed grammars, in a way which seemed consistent with the idea that the issue is related to number of grammars.
However, after 2-3 bisections, things plateaued and I think I was able to narrow it down to comment. Here are the numbers for Go + comment:
average last maximum
[get_matching_delims]
initial_pair 0.77ms 0.40ms 3.09ms
get_matches 0.82ms 0.44ms 3.14ms
[loader_init_buffer]
done 16.77ms 3.11ms 29.28ms
[loading]
init_done 5.96ms 5.96ms 5.96ms
[loading_module]
misc 0.01ms 0.01ms 0.01ms
motion 0.05ms 0.03ms 0.06ms
matchparen 1.78ms 1.78ms 1.78ms
loader 3.51ms 3.51ms 3.51ms
[matchparen.highlight]
get_current 179.50ms 0.34ms 448.10ms
get_matching 8.36ms 1.60ms 11.16ms
end 8.56ms 1.81ms 11.35ms
[s:get_delim]
first_pass 0.14ms 0.12ms 0.59ms
nothing_found 0.16ms 0.13ms 0.61ms
got_results 7.17ms 0.70ms 10.40ms
[ts_engine.get_delim]
fail 179.18ms 0.06ms 447.69ms
If I disable the grammar for go
and leave comment
, performance is fine, so there could be a bad interaction between the two for some reason.
The fact that things trended downward at first made me suspect that maybe there was more than one grammar causing problems, but in general, it looks as though things are ok as long as comment
isn't enabled. This could mean that my machine is a little flaky or that the comment grammar has some bad interaction with other grammars as well.
Full data for each bisection is attached if you're interested:
matchup-times.txt
A fantastic analysis, thank you. It does seem like comment causes the loop
2dffaed#diff-11ecb1470262a3880293bec18aee20379a3999712dd6da9041f891a17cc6153bR39
to go from one to many.
So, what would happen if we just skip the loop when the lang is "comment?"
That looks like it solves it!
Here are times for go + comment on the most recent commit.
average last maximum
[get_matching_delims]
initial_pair 0.83ms 0.29ms 4.73ms
get_matches 0.88ms 0.33ms 4.78ms
[loader_init_buffer]
done 5.49ms 0.47ms 10.90ms
[loading]
init_done 5.27ms 5.27ms 5.27ms
[loading_module]
misc 0.01ms 0.01ms 0.01ms
motion 0.05ms 0.03ms 0.06ms
matchparen 1.78ms 1.78ms 1.78ms
loader 2.82ms 2.82ms 2.82ms
[matchparen.highlight]
get_current 3.97ms 0.77ms 16.13ms
get_matching 10.30ms 1.42ms 16.60ms
end 10.46ms 1.58ms 16.77ms
[s:get_delim]
first_pass 0.19ms 0.27ms 0.89ms
nothing_found 0.22ms 0.31ms 0.91ms
got_results 9.05ms 0.72ms 15.94ms
[ts_engine.get_delim]
fail 3.57ms 0.14ms 10.19ms
And as expected, the times do not change very much with all of the rest of the grammars enabled as well.
So I'm happy to consider this issue closed if you are. Let me know if there are any other tests you'd like me to run for any reason.
Thanks for the quick fix and the great plugin.