performance problems in Go code with large number of tree-sitter grammars installed

Description of Issue

NVIM v0.9.0
Build type: Release
LuaJIT 2.1.0-beta3

  system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/nix/store/h58i7gppfamlnadkwzj35wxc9sklpdsx-neovim-unwrapped-0.9.0/share/nvim"

Scrolling through Go code is fairly quick, but editing results in a pretty lengthy pause. Here's the MatchupShowTimes output after just entering Insert mode in src/fmt/print.go (source):

                                   average       last          maximum
[get_matching_delims]
  initial_pair                     0.44ms        0.27ms        2.82ms
  get_matches                      0.49ms        0.32ms        2.87ms
[loader_init_buffer]
  done                            15.58ms        9.15ms       25.14ms
[loading]
  init_done                       18.49ms       18.49ms       18.49ms
[loading_module]
  misc                             0.01ms        0.01ms        0.01ms
  motion                           0.05ms        0.03ms        0.05ms
  matchparen                       8.26ms        8.26ms        8.26ms
  loader                           9.60ms        9.60ms        9.60ms
[matchparen.highlight]
  get_matching                     5.66ms        1.49ms       20.87ms
  end                              5.82ms        1.66ms       21.05ms
  get_current                    803.07ms     1632.90ms     2243.81ms
[s:get_delim]
  first_pass                       0.14ms        0.12ms        0.28ms
  nothing_found                    0.16ms        0.13ms        0.32ms
  got_results                      4.80ms        0.77ms       20.25ms
[ts_engine.get_delim]
  fail                           802.71ms     1632.50ms     2243.52ms

And here is the result of a func profile:

FUNCTIONS SORTED ON TOTAL TIME
count  total (s)   self (s)  function
   53   4.090691   0.007120  4()
   48   4.061295   0.000661  matchup#delim#get_current()
   48   4.060634   0.001619  <SNR>80_get_delim_multi()
   48   4.021120   0.001011  matchup#ts_engine#get_delim()
   48   4.019052             <SNR>76_forward()
   51   1.451991   0.000517  3()
   48   0.037895   0.008010  <SNR>80_get_delim()
   16   0.027112             nvim_treesitter#foldexpr()
   60   0.024977   0.001351  matchup#ts_syntax#skip_expr()
  120   0.022805             <SNR>83_forward()
   36   0.021639   0.000912  matchup#delim#skip()
   60   0.013530   0.000270  matchup#ts_syntax#lang_skip()
   18   0.013403   0.002397  matchup#delim#get_matching()
   18   0.010411   0.003533  <SNR>80_get_matching_delims()
   60   0.010097   0.000551  matchup#ts_syntax#synID()
    1   0.006348             nvim_treesitter#indent()
   18   0.005113   0.004808  <SNR>80_parser_delim_new()
   24   0.004918   0.000443  matchup#delim#skip0()
  264   0.004125             matchup#perf#toc()
  240   0.003169   0.002059  <SNR>79_parse_args()

FUNCTIONS SORTED ON SELF TIME
count  total (s)   self (s)  function
   48              4.019052  <SNR>76_forward()
   16              0.027112  nvim_treesitter#foldexpr()
  120              0.022805  <SNR>83_forward()
   48   0.037895   0.008010  <SNR>80_get_delim()
   53   4.090691   0.007120  4()
    1              0.006348  nvim_treesitter#indent()
   18   0.005113   0.004808  <SNR>80_parser_delim_new()
  264              0.004125  matchup#perf#toc()
   18   0.010411   0.003533  <SNR>80_get_matching_delims()
   18   0.013403   0.002397  matchup#delim#get_matching()
  240   0.003169   0.002059  <SNR>79_parse_args()
   18   0.002233   0.002042  <SNR>58_add_matches()
   48   4.060634   0.001619  <SNR>80_get_delim_multi()
   48   0.001703   0.001611  1()
   60   0.024977   0.001351  matchup#ts_syntax#skip_expr()
   48              0.001118  matchup#loader#refresh_match_words()
   48   4.021120   0.001011  matchup#ts_engine#get_delim()
   36   0.021639   0.000912  matchup#delim#skip()
   54              0.000710  matchup#delim#fill_backrefs()
   48   4.061295   0.000661  matchup#delim#get_current()

I've narrowed this down to 2dffaed and suspect the issue might be that I have a large number of grammars installed (150+). Performance is much better (but still noticeably stutters) if I drop installed grammars to < 20. Prior to 2dffaed, performance does not seem to be impacted much by number of installed grammars.

Minimal vimrc file

I am using home-manager to generate my nvim configurations and install plugins, so the setup is a little odd.

Plugins are loaded via a wrapped nvim command:

exec -a "$0" "/nix/store/h58i7gppfamlnadkwzj35wxc9sklpdsx-neovim-unwrapped-0.9.0/bin/nvim"  --cmd "set packpath^=/nix/store/dg0ylyiz2fypapk530gly943l1wvf8h4-vim-pack-dir" --cmd "set rtp^=/nix/store/dg0ylyiz2fypapk530gly943l1wvf8h4-vim-pack-dir" "$@"

packpath here contains only nvim-treesitter, vim-matchup, and directories for each of the grammars I have installed (a lot of them).

There is no additional configuration besides this:

-- init.lua
require 'nvim-treesitter.configs'.setup {
        ensure_installed = {},
        -- andymass/vim-matchup/
        matchup = {
                enable = true,
        },
}

Hmm I don't really see why number of grammars would impact it, my expectation (untested) is for_each_tree would just return 1 tree for a document which doesn't have embedded sub-grammars.

I see a possible issue in 2dffaed though, get_matches was cached whereas collect_group_results does not seem to be.

I added memoization to get_matches, would you be able to try it again?

Tried it just now with all grammars installed:

                                   average       last          maximum
[get_matching_delims]
  initial_pair                     0.67ms        0.24ms        2.80ms
  get_matches                      0.72ms        0.28ms        2.85ms
[loader_init_buffer]
  done                            15.87ms        9.52ms       25.56ms
[loading]
  init_done                       19.48ms       19.48ms       19.48ms
[loading_module]
  misc                             0.01ms        0.01ms        0.01ms
  motion                           0.05ms        0.03ms        0.05ms
  matchparen                       8.95ms        8.95ms        8.95ms
  loader                           9.85ms        9.85ms        9.85ms
[matchparen.highlight]
  get_matching                     3.04ms        1.22ms       15.02ms
  end                              3.20ms        1.37ms       15.19ms
  get_current                    769.76ms     1772.04ms     2219.16ms
[s:get_delim]
  first_pass                       0.30ms        0.11ms        1.11ms
  nothing_found                    0.32ms        0.12ms        1.15ms
  got_results                      1.94ms        0.62ms       14.43ms
[ts_engine.get_delim]
  fail                           769.20ms     1771.76ms     2218.78ms

It's better than what I first reported, but still a big delay, and much slower than when I pare the grammars down to just this set:

bash,dockerfile,go,java,jq,json,lua,make,nix,python,regex,rust,toml,yaml,zig,javascript,ruby,typescript,vim,vimdoc

                                   average       last          maximum
[get_matching_delims]
  initial_pair                     0.36ms        0.21ms        2.39ms
  get_matches                      0.41ms        0.25ms        2.44ms
[loader_init_buffer]
  done                             6.43ms        0.54ms       12.20ms
[loading]
  init_done                        7.62ms        7.62ms        7.62ms
[loading_module]
  misc                             0.01ms        0.01ms        0.01ms
  motion                           0.05ms        0.04ms        0.06ms
  matchparen                       2.38ms        2.38ms        2.38ms
  loader                           4.44ms        4.44ms        4.44ms
[matchparen.highlight]
  get_matching                     3.45ms        1.11ms       11.27ms
  end                              3.62ms        1.26ms       11.45ms
  get_current                      7.87ms        1.41ms       27.46ms
[s:get_delim]
  got_results                      2.63ms        0.52ms       10.75ms
  first_pass                       0.24ms        0.72ms        1.04ms
  nothing_found                    0.26ms        0.76ms        1.17ms
[ts_engine.get_delim]
  fail                             7.43ms        0.31ms       24.89ms

Is there more information I can provide to help? I can attach a full func profile if you'd like.

Is there more information I can provide to help? I can attach a full func profile if you'd like.

Not sure why I asked first, it was 1 minute of effort. Here's a profile with all plugins disabled except for vim-matchup, nvim-treesitter, and nvim-treesitter-textobjects.
profile-all-grammars.log

I got this by running:

:e ./src/fmt/print.go
:profile start profile-all-grammars.log
:profile func *
:profile file *
64j
O
asdfasdfasdf <esc>
:profile pause
:q

Do you have a list of TS grammars included in "all grammars?"

I'm thinking that I can use bisection strategy to identify whether there is one grammar that is causing the issue versus a slow increase for each grammar leading to the bad performance.

That's a good idea. Here's the list:

ada,agda,arduino,astro,awk,bash,bass,beancount,bibtex,bicep,blueprint,c,c_sharp,capnp,chatito,clojure,cmake,comment,commonlisp,cooklang,cpon,cpp,css,cuda,cue,d,dart,devicetree,dhall,diff,dockerfile,dot,ebnf,eex,elixir,elm,elsa,elvish,embedded_template,erlang,fennel,firrtl,fish,foam,fortran,fsh,func,fusion,gdscript,git_config,git_rebase,gitattributes,gitcommit,gitignore,gleam,glimmer,glsl,go,godot_resource,gomod,gosum,gowork,graphql,hack,hare,haskell,hcl,heex,hjson,hlsl,hocon,html,htmldjango,http,ini,java,javascript,jq,jsdoc,json,json5,jsonc,jsonnet,julia,kdl,kotlin,lalrpop,latex,ledger,llvm,lua,luadoc,luap,m68k,make,markdown,markdown_inline,matlab,menhir,mermaid,meson,mlir,nickel,ninja,nix,norg,ocaml,ocaml_interface,ocamllex,org,pascal,passwd,perl,php,phpdoc,pioasm,po,poe_filter,pony,prisma,proto,prql,pug,python,ql,qmldir,qmljs,query,r,racket,rasi,regex,rego,rnoweb,ron,rst,ruby,rust,scala,scheme,scss,slint,smali,smithy,solidity,sparql,sql,squirrel,starlark,supercollider,surface,svelte,swift,sxhkdrc,t32,tablegen,teal,terraform,thrift,tiger,tlaplus,todotxt,toml,tsx,turtle,twig,typescript,ungrammar,uxntal,v,vala,verilog,vhs,vim,vimdoc,vue,wgsl,wgsl_bevy,yaml,yang,yuck,zig

I believe it's generated with this list. It's definitely overkill; I haven't even heard of most of the things in this list.

I'll also try to binary search the list later this evening when I'm done with work stuff in case you don't get to it by then. Switching up configs/grammars is pretty quick for me so I'm happy to help with any other tests you can think of.

Ok, I ran a bunch of tests. Basic methodology was keep the grammar for go turned on, but comment out halves of the remaining 180ish grammars and try to repro.

At first, the numbers did seem to trend downward as I removed grammars, in a way which seemed consistent with the idea that the issue is related to number of grammars.

However, after 2-3 bisections, things plateaued and I think I was able to narrow it down to comment. Here are the numbers for Go + comment:

                                   average       last          maximum
[get_matching_delims]
  initial_pair                     0.77ms        0.40ms        3.09ms
  get_matches                      0.82ms        0.44ms        3.14ms
[loader_init_buffer]
  done                            16.77ms        3.11ms       29.28ms
[loading]
  init_done                        5.96ms        5.96ms        5.96ms
[loading_module]
  misc                             0.01ms        0.01ms        0.01ms
  motion                           0.05ms        0.03ms        0.06ms
  matchparen                       1.78ms        1.78ms        1.78ms
  loader                           3.51ms        3.51ms        3.51ms
[matchparen.highlight]
  get_current                    179.50ms        0.34ms      448.10ms
  get_matching                     8.36ms        1.60ms       11.16ms
  end                              8.56ms        1.81ms       11.35ms
[s:get_delim]
  first_pass                       0.14ms        0.12ms        0.59ms
  nothing_found                    0.16ms        0.13ms        0.61ms
  got_results                      7.17ms        0.70ms       10.40ms
[ts_engine.get_delim]
  fail                           179.18ms        0.06ms      447.69ms

If I disable the grammar for go and leave comment, performance is fine, so there could be a bad interaction between the two for some reason.

The fact that things trended downward at first made me suspect that maybe there was more than one grammar causing problems, but in general, it looks as though things are ok as long as comment isn't enabled. This could mean that my machine is a little flaky or that the comment grammar has some bad interaction with other grammars as well.

Full data for each bisection is attached if you're interested:
matchup-times.txt

A fantastic analysis, thank you. It does seem like comment causes the loop
2dffaed#diff-11ecb1470262a3880293bec18aee20379a3999712dd6da9041f891a17cc6153bR39
to go from one to many.

So, what would happen if we just skip the loop when the lang is "comment?"

vim-matchup/lua/treesitter-matchup/internal.lua

Line 41 in 1364b2b

if not tree or lang_tree:lang() == 'comment' then

That looks like it solves it!

Here are times for go + comment on the most recent commit.

                                   average       last          maximum
[get_matching_delims]
  initial_pair                     0.83ms        0.29ms        4.73ms
  get_matches                      0.88ms        0.33ms        4.78ms
[loader_init_buffer]
  done                             5.49ms        0.47ms       10.90ms
[loading]
  init_done                        5.27ms        5.27ms        5.27ms
[loading_module]
  misc                             0.01ms        0.01ms        0.01ms
  motion                           0.05ms        0.03ms        0.06ms
  matchparen                       1.78ms        1.78ms        1.78ms
  loader                           2.82ms        2.82ms        2.82ms
[matchparen.highlight]
  get_current                      3.97ms        0.77ms       16.13ms
  get_matching                    10.30ms        1.42ms       16.60ms
  end                             10.46ms        1.58ms       16.77ms
[s:get_delim]
  first_pass                       0.19ms        0.27ms        0.89ms
  nothing_found                    0.22ms        0.31ms        0.91ms
  got_results                      9.05ms        0.72ms       15.94ms
[ts_engine.get_delim]
  fail                             3.57ms        0.14ms       10.19ms

And as expected, the times do not change very much with all of the rest of the grammars enabled as well.

So I'm happy to consider this issue closed if you are. Let me know if there are any other tests you'd like me to run for any reason.

Thanks for the quick fix and the great plugin.