tree-sitter/tree-sitter-c-sharp

Parsing failure in CSharp script (.csx) file with `#r` directives

blazkowolf opened this issue · 2 comments

I haven't actually looked into the code for this myself, but I imagine it's some kind of edge case with the tree-sitter grammar causing this anomaly. When using #r directives at the top of a csx file, the highlighting provided by tree-sitter breaks, and self-recovers part way down the source file.

Ex.
image

Then a little ways down the file, you can observe the highlighting kicks back in.
image

Then with the #r directives commented out, the highlighting works as expected.
image

My apologies in advance if this is the incorrect forum for an issue like this.

Options

We have a couple of options here:

  1. We add support for these .csx specific preprocessor instructions

This isn't difficult other than I can't find a specific official list of them anywhere. They would be coloured correctly but also accepted as valid in normal C# files. Likewise we might allow stuff in CSX files that isn't valid - again can't find spec.

I think this is what Roslyn is doing as it knows the "#r" syntax even inside .cs files and then tells you its only valid in scripts. We don't have this second-level of flagging problems in tree-sitter.

  1. We add a "bad directive" preprocessor instruction

Roslyn also does this to handle all sorts of scenarios and lets it recover from misunderstood preprocessor directives. This should be easy to do and would mean we continue to nicely parse a whole lot of invalid scenarios, e.g. putting #exit in a file. Roslyn again allows this so the whole file looks nicely highlighted and uses this second-level to flag that there is no such known directive.

  1. We nest/extend the C# syntax from a new C# Script syntax

I don't know how to do this but I suspect it's possible and that examples likely exist in other language tree-sitters.

Going forward

Right now I'm tempted to do option 2 as it opens up a whole lot of recovery options. It would be great if there was a way of indicating a parsing rule is there for recovery but isn't itself valid. Is this possible @maxbrunsfeld ?

svick commented

I can't find a specific official list of them anywhere

I don't know if there's an official list that includes the scripting directives, but here is the code in Roslyn that parses directives (don't miss the handling of #! at the end) and here is the code that maps directive names to the SyntaxKind used by the parser.

Though you may already know that.