[Performance] Use `tree-sitter` to provide syntax highlighting ?
Opened this issue ยท 12 comments
Hello,
I notice bat command is very slow, maybe we can use tree-sitter to provide syntax highlight, tree-sitter is very fast.
It might be nice to have a proof of concept, so we can make an objective comparison on:
- speed (including startup speed)
- quality of highlighting
- range of languages supported (and how well maintained they are)
- range of color schemes
- ease of adding custom languages
- ease of tweaking color schemes
- binary size / asset bundle size
- compile time
A few good libraries to consider which build on tree-sitter:
- https://crates.io/crates/inkjet
- Uses helix themes, so good support
- https://crates.io/crates/syntastica
- Seems to require defining themes in code with a macro
As for language support, the ecosystem is pretty healthy: https://github.com/tree-sitter/tree-sitter/wiki/List-of-parsers
Thanks @aw1cks ๐ quick question which I have after skimming those crates which wasn't immediately apparent - do they support incremental highlighting? bat currently highlights one line at a time and outputs decorations like line numbers which aren't sent to the highlighter, and the examples from those two crates seem to imply that the entire code must be provided to the highlighter in one go...
Hmm good question, I'm not sure, but they do both expose a trait that for implementing your own rendering which may be sufficiently powerful to inject any extra logic needed: inkjet Formatter, syntastica Renderer
I try to use tree-sitter to provide syntax highlight and it looks feasible. And nvim-neorocks organization provide pre-built tree-sitter parsers in lua rock file formats, which is available.
Having treesitter support would be great!
Bat is used in many context. I'm using it as part of presenterm and I'm missing Gleam support, where as treesitter would provide this as well as many other languages.
So I'd like to see treesitter support even if the performance is about the same, because it would expand the number of languages that can be highlighted.
Ironically enough one of those 22 issues is this very one :D
Adopting tree-sitter would allow to leverage other people's work to correct language grammar in a divide and conquer approach, resulting in less time spend on bug fixes.
That's of course true of the current approach with using sublime text's syntaxes, but there are currently some long-standing maintenance issues there too. You can always add your own .sublime-syntax file for gleam as long as it doesn't use any of the newer features that aren't supported by syntect yet
My personal take on this is that .sublime-syntax files are within my current skillset - I can personally create new ones from scratch, maintain them etc. even directly in this repo. With tree-sitter, I would be out of my comfort zone. It looks complicated, relying solely on precompiled parsers from other projects or requiring the knowledge of how to compile any tree-sitter parser. So if we were to switch to tree-sitter, I think that bat should just pull in an external dependency which would include all the languages we need. That way, all issues relating to language support and highlighting bugs could be managed by those with the knowledge. (And who knows, maybe I would be able to follow along and eventually contribute to it myself)
Presumably there would still be a need for users to be able to add their own parsers, but bat could defer to the external crate's documentation on how to achieve that.
So in my view, this issue is waiting for someone to build a proof of concept - a crate which bat could pull in and use to:
- determine which parser to use based on file name or first line etc, or a specific language so our existing mappings could be used (perhaps with some tweaks)
- take input line by line and output highlighted text using ANSI escape codes
- allow color scheme selection - presumably auto detection of light vs dark etc could still be done by
batand it's current dependencies
And before we would merge it or have it the default, we would want:
- most if not all of
bats current language support to be handled by tree-sitter parsers - decent performance
- allow custom parsers to be loaded also
Also it looks like there are 22 issues (as of the time of this comment) regarding syntax highlighting issues. Adopting
tree-sitterwould allow to leverage other people's work to correct language grammar in a divide and conquer approach, resulting in less time spend on bug fixes.
@Velrok Many of those issues could be solved with the current solution, or are out of scope, or "just" require syntect to be updated with support for newer features as @CosmicHorrorDev has already said.
For the record, inkjet is deprecated. The author recommends this, which seems even cleaner, but currently doesn't support incremental highlighting. I'm thinking about creating a PR because I would like this for my own project as well, and regardless there's an active issue. If it's merged, I'd love to do a PR with it, if everyone would be okay with it? Or did another project seem like a better fit?
No objections from me ๐

