AbstractMachinesLab/tree-sitter-erlang

parser hang when typing integer value in function body

wingyplus opened this issue ยท 10 comments

Consider screenshot below:

Screen Shot 2563-12-10 at 18 24 30

After I typing 1 in function body, the parser got hang and cpu was increased. It's happens on tree-sitter web-ui also.

Reproduced by test corpus:

================
Function clause bug
================

-module(amod).

fibonacci(1) ->
    1

---
(source_file ())

The result from tree-sitter test:

$ yarn tree-sitter test -f 'Function clause bug'
comments:
expr_function_call:
term_string:
macros:
term_char:
expr_list:
advent_of_code_2020:
term_tuple:
otp_lists:
term_atom:
module_attribute:
expr_case:
expr_exceptions:
expr_bit_string:
term_float:
term_integer:
expr_map:
expr_send:
types:
term_binary_string:
expr_receive:
expr_variable:
term_fun:
term_list:
term_record:
expr_if:
function_clause_bug:
<< hang

Thanks! I'll add that test case and look into it ๐Ÿ™Œ๐Ÿผ

Btw is that neovim on your screenshot?

@Ostera yes. The left hand side is nvim treesitter playground plugin. :)

Sweet!

What would the expected behavior be here? I presume you'd want a "MISSING" error.

Sweet!

What would the expected behavior be here? I presume you'd want a "MISSING" error.

It would be like you presume or tokenized correctly (in the test case above, it should be a number, I guess). And it should be able to continue typing not hang up.

I found that the parser not hang up after comment $.expr_operator out in L313.

yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter parse amod.erl -d
(source_file [0, 0] - [4, 0]
  (module_name [0, 0] - [0, 14]
    (atom [0, 8] - [0, 12]
      value: (unquoted_atom [0, 8] - [0, 12])))
  (ERROR [2, 0] - [3, 3]
    (atom [2, 0] - [2, 9]
      value: (unquoted_atom [2, 0] - [2, 9]))
    (pattern [2, 10] - [2, 11]
      (term [2, 10] - [2, 11]
        (integer [2, 10] - [2, 11])))
    (ERROR [3, 2] - [3, 3])))
amod.erl	0 ms	(ERROR [2, 0] - [3, 3])
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

NOTE: the amod.erl is the example from my screenshot.

Hm, the way we're using the choice operator may be too expensive / run too many branches?

Could you try this approach where we'd be flattening out the branching (right now its in 2 levels for us)?

so instead of

expr_operator_binary: ($) =>
  choice(
    prec.left(
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", oneOf(OP2_LEFT_ASSOC)),
        field("rhs", $.expression)
      )
    ),
    prec.right(
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", oneOf(OP2_RIGHT_ASSOC)),
        field("rhs", $.expression)
      )
    )
  ),

we'd have something like:

expr_operator_binary: ($) => choice(  
  ...[
  ...OP2_LEFT_ASSOC.map( op => [ "left", op ])
  ...OP2_RIGHT_ASSOC.map( op => [ "right", op ])
  ].map( [dir, op] => 
    prec[dir](
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", op),
        field("rhs", $.expression)
      )
    )
  ),   

It's still not working:

diff --git a/grammar.js b/grammar.js
index b4205ba..5c1d984 100644
--- a/grammar.js
+++ b/grammar.js
@@ -405,20 +405,17 @@ module.exports = grammar({

     expr_operator_binary: ($) =>
       choice(
-        prec.left(
-          PREC.BINARY_OP,
-          seq(
-            field("lhs", $.expression),
-            field("operator", oneOf(OP2_LEFT_ASSOC)),
-            field("rhs", $.expression)
-          )
-        ),
-        prec.right(
-          PREC.BINARY_OP,
-          seq(
-            field("lhs", $.expression),
-            field("operator", oneOf(OP2_RIGHT_ASSOC)),
-            field("rhs", $.expression)
+        ...[
+          ...OP2_LEFT_ASSOC.map((op) => [prec.left, op]),
+          ...OP2_RIGHT_ASSOC.map((op) => [prec.right, op]),
+        ].map(
+          ([_prec, op]) => _prec(
+            PREC.BINARY_OP,
+            seq(
+              field("lhs", $.expression),
+              field("operator", op),
+              field("rhs", $.expression)
+            )
           )
         )
       ),
$ yarn tree-sitter generate && yarn tree-sitter parse amod.erl -d
yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter generate
โœจ  Done in 0.77s.
yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter parse amod.erl -d
new_parse
process version:0, version_count:1, state:1, row:1, col:0
lex_internal state:47, row:1, column:0
  consume character:'-'
lexed_lookahead sym:-, size:1
shift state:4
process version:0, version_count:1, state:4, row:1, col:1
lex_internal state:47, row:1, column:1
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'u'
  consume character:'l'
  consume character:'e'
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'u'
  consume character:'l'
  consume character:'e'
lexed_lookahead sym:module, size:6
shift state:830
process version:0, version_count:1, state:830, row:1, col:7
lex_internal state:0, row:1, column:7
  consume character:'('
lexed_lookahead sym:(, size:1
shift state:539
process version:0, version_count:1, state:539, row:1, col:8
lex_internal state:0, row:1, column:8
  consume character:'a'
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'a'
lexed_lookahead sym:unquoted_atom, size:4
shift state:11
process version:0, version_count:1, state:11, row:1, col:12
lex_internal state:45, row:1, column:12
  consume character:')'
lexed_lookahead sym:), size:1
reduce sym:atom, child_count:1
shift state:790
process version:0, version_count:1, state:790, row:1, col:13
lex_internal state:47, row:1, column:13
  consume character:'.'
lexed_lookahead sym:., size:1
shift state:397
process version:0, version_count:1, state:397, row:1, col:14
lex_internal state:47, row:1, column:14
  skip character:10
  skip character:10
  consume character:'f'
  consume character:'i'
  consume character:'b'
  consume character:'o'
  consume character:'n'
  consume character:'a'
  consume character:'c'
  consume character:'c'
  consume character:'i'
  consume character:'f'
lexed_lookahead sym:unquoted_atom, size:11
reduce sym:module_name, child_count:6
shift state:11
process version:0, version_count:1, state:11, row:3, col:9
lex_internal state:45, row:3, column:9
  consume character:'('
lexed_lookahead sym:(, size:1
reduce sym:atom, child_count:1
shift state:394
process version:0, version_count:1, state:394, row:3, col:10
lex_internal state:47, row:3, column:10
  consume character:'1'
lexed_lookahead sym:integer_token3, size:1
shift state:12
process version:0, version_count:1, state:12, row:3, col:11
lex_internal state:45, row:3, column:11
  consume character:')'
lexed_lookahead sym:), size:1
reduce sym:integer, child_count:1
reduce sym:term, child_count:1
reduce sym:pattern, child_count:1
shift state:807
process version:0, version_count:1, state:807, row:3, col:12
lex_internal state:0, row:3, column:12
  skip character:' '
  consume character:'-'
  consume character:'>'
lexed_lookahead sym:->, size:3
shift state:84
process version:0, version_count:1, state:84, row:3, col:15
lex_internal state:47, row:3, column:15
  skip character:10
  skip character:' '
  skip character:' '
  consume character:'1'
lexed_lookahead sym:integer_token3, size:4
shift state:215
process version:0, version_count:1, state:215, row:4, col:3
lex_internal state:2, row:4, column:3
  skip character:10

I've narrowed it down to that rule being the one that takes an infinity to parse. If you keep the unary operator expression one things should work fine.

So I'll be reworking that one soon :) thanks for finding this!

@wingyplus I've done 2 rewrites of the thing and I can't seem to get it to work well.

It seems that supporting both free-standing expressions and a proper module, clashes.

E.g, the Erlang expression language is embedded into the Erlang module language.

So we may need to provide 2 separate grammars.