georgewfraser/vscode-tree-sitter

Color Issues

jeff-hykin opened this issue ยท 13 comments

I'm not sure how the colors of theme's are being applied but lots of themes still look worse with the tree sitter, at least for C++ ๐Ÿ˜•I'd recommend upgrading the C++ TextMate grammar to pull from VS Code's upstream source but even after that there are some bugs that need to be worked out.

I think this issue will be fixed when the TreeSitter only adds colors, and the added colors correspond to the theme's colors for those things.

Material Theme:

With the tree sitter:
material-with-sitter

Without the tree sitter and with the Better C++ Syntax extension (the upstream for the C++ syntax):
material-no-sitter

XD Theme

Using a theme that is more fitted to C++ make the difference more extreme:

With the tree sitter:
with-sitter

Without tree-sitter, and with Better C++ Syntax
no-sitter

I'm not really a C++ expert so it's hard for me to tell what's good and bad. I think the reason the C++ grammar doesn't look great is because I just haven't spent much time on it.

The C++ highlighting you see is a combination of two "layers":

  1. A simplified C++ grammar that just colors keywords and literals.
  2. A javascript function, colorCpp, that traverses the tree-sitter syntax tree and adds tricky colors using setDecorations.

There are significant advantages to this two-layer strategy:

  • When you first open a file, the basic colors appear right away, and the tricky colors "pop" in after about a delay of a tenth of a second or so.
  • setDecorations has a meaningful cost, so it should be used selectively. If we used setDecorations to color everything, you would notice delays in coloring when you scroll around large files.

We just need to invest a little time fixing up the TextMate grammar and the color function. I just pushed a change 8f5d623 that makes it possible for colorCpp to use arbitrary textmate scopes. Would be great if you wanted to tackle this!

The grammar still seems to appear right away even with the more complex TextMate grammar, so you might want to upgrade it at least temporarily.
https://github.com/jeff-hykin/cpp-textmate-grammar/blob/master/syntaxes/cpp.tmLanguage.json

Me and another guys have invested quite a lot of time fixing up the C++ grammar and I'd hate for people to have to do the same work twice.

I'd love to help out mapping the TextMate scopes to the tree-sitter! When I get back to working on this I'll take a look at the colorCpp.

I don't think there's an issue with the TextMate grammar. In this extension, the job of the TextMate grammar is just to color the easy parts. This is what C++ looks like if you turn off the colorCpp function:

Screen Shot 2019-05-26 at 11 53 33 AM

It's the job of colorCpp to do anything that isn't in that screenshot. colorCpp has access to the whole parse tree, so it can do very accurate syntax coloring. For example, in Go, package members are colored differently than fields:

shadow mov

You have to write the color function very carefully to ensure fast performance on large files but it's doable.

Here's some of the issues with the minimal TextMate Grammar, even the easy parts
The #errors are highlighted incorrectly
Screen Shot 2019-05-26 at 9 51 51 PM

Parameters are not highlighted at all
Screen Shot 2019-05-26 at 9 55 30 PM

Custom literals, hex floating point numbers, and number separators are not highlighted correctly
(all of them should be red)
Screen Shot 2019-05-26 at 9 55 34 PM

Names with escapes in them are broken
Screen Shot 2019-05-26 at 9 59 01 PM

Attributes are never tagged/highlighted
Screen Shot 2019-05-26 at 10 34 21 PM

And many many tokens have the wrong scope/coloring:
namespace, using, class, void, semicolon, and, or, xor, typeid, sizeof, alignas, new, delete, auto, etc

Here's a better example (the same theme in both pictures).

The only things highlighted correctly are return and the integer-literals (e.g. 10)
The int, auto, mutable and ; are highlighted incorrectly, and most things are not highlighted at all
Screen Shot 2019-05-26 at 10 17 18 PM

Everything in this next picture (using only the TextMate grammar) is highlighted correctly

Screen Shot 2019-05-26 at 10 18 07 PM

I'm not saying the problem is "the TextMate grammar isn't doing enough" I just wanted to point out that the combination of the minimal TextMate grammar and the tree-sitter is currently a significant downgrade for many themes. It could just mean that the tree-sitter just needs to do a lot more.

I think if there's ever going to be widespread adoption of the Tree Sitter, its going to need to be at least semi backwards compatible with the existing themes.

Help me understand this part:

And many many tokens have the wrong scope/coloring:
namespace, using, class, void, semicolon, and, or, xor, typeid, sizeof, alignas, new, delete, auto, etc

Are you saying namespace should be colored differently than using? They're both keywords and when I look at other IDEs like Visual Studio they're colored the same.

Yes they're functionally very different. I think should be up to the theme maker to choose whether or not namespace and using are colored different. In my opinion it's the parser's job to tell VS Code exactly what things are (e.g. a c++ lambda function definition, a logical operator, a keyword usage) and then it's the theme's job to pick a color for specific things.

new, delete, and, or, xor are operators they're more similar to the + operator than the using keyword, and in particular the new and delete operators are memory operators.

class, namespace, struct, union , enum are storage type declarorators.

if, else, while, try etc are control flow, them (and functions) are the only thing that can make the code execute non linearly (e.g. more than once, or not at all)

void , int , bool , double , etc are storage types

const, mutable, static etc are type modifiers

template and typedef are type declarators

Etc.

The differences are really important and many many themes take advantage of them

One of the main reasons I prefer VS Code over IDE's is because the colors in VS Code can do a much better job at illustratrating the meaning of the code, and you can pick to highlight what is most meaningful to you and your project.

In the second picture: type specifiers are pink mutable, types are yellow auto, int, Ret, Args, parameter definitions are orange, functions are dark blue, control flow is light blue, assignment operators are bright red, number literals are soft red, member access is dim green.

Ok, I think I understand: right now I have all the keywords in two categories, keyword.control and keyword. Instead I should use the longer, more specific scopes like keyword.other.namespace or whatever it is, so that existing c++ color themes can pick them up and color them differently.

Weโ€™ll also need to put back a bunch of punctuation scopes for brackets and things, and it looks like entity.name.type is different than the scope c++ was using before.

Sounds good!

There is a large challenge around the naming though, longer names are not really sufficient. There is a TextMate naming convention that is used as a standard to get compatibility across languages. On the C++ TextMate repo, we've spent a lot of time with people from VS Code coordinating changes to the scopes, how they should be named, and then talking to theme makers when they change. Whenever the TextMate conventions don't cover something, we search all of the other TextMate grammars in order to find the most similar naming among all languages.

It's a non trivial task, there are 600 scopes in C++ and more are going to be added throughout the summer. Many of them are generated programmatically. That's why I recommend starting with an existing C++ grammar and then only removing the parts that the Tree-Sitter replaces.

Looking at my Go code and switching through a bunch of themes with the plugin enabled, it seems like some colors are hard-coded, like the custom type name and the types in a struct.

Yeah, that is one of the overall current issues with the extension. Most of the colors are hardcoded because VS Code doesn't have a good way of letting extensions get theme colors.

No I'm getting the colors from VSCode. But if you switch themes, my colors don't update until you switch to a different file. I think VSCode has added an API to get notified of theme colors, I haven't had a chance to look into it yet though.

I mean code (TextMate) colors like "source.js operator.keyword.control", not the "syntax.variable" colors