Use tree-sitter
Moosems opened this issue · 9 comments
[15:00] Moosems Actually not :)
[15:00] Moosems His chlorophyll package
[15:00] Arrinao oh she did
[15:01] Moosems https://github.com/rdbende/chlorophyll/pull/23
[15:02] Akuli porcupine has one of these too
[15:03] Moosems Yep
[15:04] Akuli in porcupine, the first step is to turn all changes in the text widget into nice Change objects: https://github.com/Akuli/porcupine/blob/main/porcupine/textutils.py#L22-L60
[15:04] Akuli e.g. Change(start=[1, 0], old_end=[1, 5], new_end=[1, 4], old_text='hello', new_text='toot') means replacing 'hello' with 'toot' at the start of the file
[15:04] Moosems Yep
[15:04] Akuli the code that constructs these is puke :D
[15:05] Akuli it basically has to implement everything that the insert, delete and replace manual pages describe
[15:07] Akuli it seems to assume that you can just start lexing anywhere you want, at the start of any line?
[15:07] Moosems ?
[15:07] Akuli it basically puts self.get(f"{start_line}.0", f"{end_line}.end") to pygments.lex()
[15:08] Moosems Yes, it only lexes what it needs to
[15:08] Akuli that won't work when you have a multiline string
[15:08] Moosems Already an issue I opened :)
[15:08] Moosems Which is what highlight_area() will help solve
[15:09] Akuli i have already "solved" it in porcupine about a year ago
[15:09] Akuli i say "solved" because it turned out to be a really hard problem
[15:09] Moosems By finding out when they start and end?
[15:09] Akuli i wanted it to work for all languages that pygments supports, so hard-coding something for python wasn't a solution
[15:09] Akuli this also applies to e.g. multiline comments in c
[15:10] Moosems Thats why I think the user should add multi line strings and docstrings as a parameter
[15:10] Moosems Like in DIP
[15:10] Akuli the best you can do (as far as i can tell): figure out when the lexer's internal state is same as its starting state, and mark those places: you "can" start lexing again from any one of them
[15:10] Moosems And if its an empty string it assumes theres no docstring type
[15:10] biberao Akuli: https://www.youtube.com/watch?v=WpAY8TGt2Ks
[15:10] Akuli i say "can" because even that doesn't work in all cases
[15:11] Moosems It's a copmlicated issue
[15:11] Akuli yes :)
[15:13] Moosems I wonder how VS Code does it
[15:13] Akuli to me the solution was to switch to a different highlighting library: tree-sitter
[15:13] Moosems tree-sitter?
[15:13] Akuli with tree-sitter you say "text from 12.34 to 56.78 was previously 'blah blah' but it is now 'blah blah'"
[15:14] Akuli it is designed to be used in an editor, unlike pygments
[15:15] Moosems How do I use it? Could you help make an MRE?
[15:15] Akuli sorry, i'm not going to spoon-feed you something that took me days to figure out
[15:15] Moosems XD
[15:16] Akuli it's not very straight forward because tree-sitter isn't a library just for syntax highlighting, it gives you a parse tree that contains things like "function definition" instead of things like "the def keyword"
[15:16] Moosems I noticed
[15:16] Akuli then it's your job to turn that syntax tree into whatever you want
[15:16] Akuli in porcupine i set up yaml files that describe how to do this, and there's one for every tree-sitter highlighted language
[15:17] Akuli note that porcupine still supports highlighting with pygments, it just isn't the default in e.g. python
[15:17] Moosems It doesn't have nearly as many languages
[15:17] Akuli yeah
[15:17] Akuli in porcupine the idea is to use tree-sitter most of the time, and fall back to pygments when a user has an exotic language
[15:18] Akuli this works well because your files are typically small and somewhat simple when you work in an exotic language, for any "real work" you tend to use a popular language instead
[15:19] Moosems Care if we steal the idea?
[15:19] Akuli go ahead :D
[15:19] Akuli you can take all the code if you want, of course :)
Ohh, what did I miss :)
But yeah, I'm aware of tree-sitter, and I even made a couple of this yaml files in Porcupine.
However I put together tkcode in a couple of hours without the intention to ever maintain it (that's why I'm sometimes a bit ignorant about this project), and after i heard about tree-sitter, I didn't really care.
I actually really like this package which is why I'm so persistent to fix all the bugs. I updated the PR and fixed the paste issue, will you join IRC today? Akuli is on right now.
However if you want to see this feature, go ahead! :))
I believe this feature could be really awesome if done in a Rust
backend using PyO3
and the tree-sitter
Rust
bindings. Will have to figure out how to make it work for those who don't have Rust
installed.
What do you mean by "done in a Rust backend"? Why would we need that?
Because the more that's done in Python, the slower this will be. To parse all the data in tree-sitter
there's a decent few for loops (I believe a few are nested too) and in Python it is well known that such a practice is unbearably slow.
So we need to make a parser from highlights.scm
to the pygments token.
With the plan to highlight only whats visible, this is unnecessary.