tree-sitter/tree-sitter-c-sharp

Parser errors on malformed C# code different in playground

nibab opened this issue · 2 comments

nibab commented

While the following malformed code
using System; class Program { static void Main() { Console.WriteLine(\"Hello, World); } }
works fine in the playground (ie the error occurs much lower in the tree) https://tree-sitter.github.io/tree-sitter/playground. see:

compilation_unit [0, 0] - [1, 0]
  using_directive [0, 0] - [0, 13]
    identifier [0, 6] - [0, 12]
  class_declaration [0, 14] - [0, 89]
    name: identifier [0, 20] - [0, 27]
    body: declaration_list [0, 28] - [0, 89]
      method_declaration [0, 30] - [0, 87]
        modifier [0, 30] - [0, 36]
        type: void_keyword [0, 37] - [0, 41]
        name: identifier [0, 42] - [0, 46]
        parameters: parameter_list [0, 46] - [0, 48]
        body: block [0, 49] - [0, 87]
          expression_statement [0, 51] - [0, 85]
            invocation_expression [0, 51] - [0, 84]
              function: member_access_expression [0, 51] - [0, 68]
                expression: identifier [0, 51] - [0, 58]
                name: identifier [0, 59] - [0, 68]
              arguments: argument_list [0, 68] - [0, 84]
                ERROR [0, 69] - [0, 71]
                  escape_sequence [0, 69] - [0, 71]
                argument [0, 71] - [0, 76]
                  identifier [0, 71] - [0, 76]
                argument [0, 78] - [0, 83]
                  identifier [0, 78] - [0, 83]

this is what the grammar file from tree-sitter-c-sharp-0.20.0 outputs (ie error parsing the method definition):

compilation_unit: [0] - [88]
  using_directive: [0] - [13]
    using: [0] - [5]
    identifier: [6] - [12]
    ;: [12] - [13]
  ERROR: [14] - [88]
    class: [14] - [19]
    identifier: [20] - [27]
    {: [28] - [29]
    modifier: [30] - [36]
      static: [30] - [36]
    predefined_type: [37] - [41]
    identifier: [42] - [46]
    parameter_list: [46] - [48]
      (: [46] - [47]
      ): [47] - [48]
    {: [49] - [50]
    member_access_expression: [51] - [68]
      identifier: [51] - [58]
      .: [58] - [59]
      identifier: [59] - [68]
    (: [68] - [69]
    ": [69] - [70]
    string_literal_fragment: [70] - [88]

im curious why that is. it is obviously difficult to know what grammar the playground is using and it's possible that this is some old grammar, although I would argue that being able to parse the method declaration is significantly better.

for reference, searching the method declaration with roslyn through:

public class MethodWalker : CSharpSyntaxWalker
{
    public List<MethodData> Methods { get; } = new List<MethodData>();

    public override void VisitMethodDeclaration(MethodDeclarationSyntax node)
    {
        string methodName = node.Identifier.ValueText;
        string methodBody = node.Body?.ToString();

        Methods.Add(new MethodData(methodName, methodBody));

        base.VisitMethodDeclaration(node);
    }
}

will also find the method declaration.

I tried parsing this input with the latest version of the grammar in the master branch, and I got the following. The tree seems to match the one returned by the playground.

❯ tree-sitter parse a.cs
(compilation_unit [0, 0] - [0, 89]
  (using_directive [0, 0] - [0, 13]
    name: (identifier [0, 6] - [0, 12]))
  (class_declaration [0, 14] - [0, 89]
    name: (identifier [0, 20] - [0, 27])
    body: (declaration_list [0, 28] - [0, 89]
      (method_declaration [0, 30] - [0, 87]
        (modifier [0, 30] - [0, 36])
        type: (predefined_type [0, 37] - [0, 41])
        name: (identifier [0, 42] - [0, 46])
        parameters: (parameter_list [0, 46] - [0, 48])
        body: (block [0, 49] - [0, 87]
          (expression_statement [0, 51] - [0, 85]
            (invocation_expression [0, 51] - [0, 84]
              function: (member_access_expression [0, 51] - [0, 68]
                expression: (identifier [0, 51] - [0, 58])
                name: (identifier [0, 59] - [0, 68]))
              arguments: (argument_list [0, 68] - [0, 84]
                (ERROR [0, 69] - [0, 71]
                  (escape_sequence [0, 69] - [0, 71]))
                (argument [0, 71] - [0, 76]
                  (identifier [0, 71] - [0, 76]))
                (argument [0, 78] - [0, 83]
                  (identifier [0, 78] - [0, 83]))))))))))
a.cs    0 ms    (ERROR [0, 69] - [0, 71])

We should probably publish a new version. I can do the GitHub and npm side but as for the Rust crate... 🤷🏻