parser hang when typing integer value in function body

Question

parser hang when typing integer value in function body

wingyplus opened this issue 4 years ago · 10 comments

Consider screenshot below:

After I typing 1 in function body, the parser got hang and cpu was increased. It's happens on tree-sitter web-ui also.

Answer 1 · 2020-12-10T11:39:34.000Z

Reproduced by test corpus:

================
Function clause bug
================

-module(amod).

fibonacci(1) ->
    1

---
(source_file ())

The result from tree-sitter test:

$ yarn tree-sitter test -f 'Function clause bug'
comments:
expr_function_call:
term_string:
macros:
term_char:
expr_list:
advent_of_code_2020:
term_tuple:
otp_lists:
term_atom:
module_attribute:
expr_case:
expr_exceptions:
expr_bit_string:
term_float:
term_integer:
expr_map:
expr_send:
types:
term_binary_string:
expr_receive:
expr_variable:
term_fun:
term_list:
term_record:
expr_if:
function_clause_bug:
<< hang

Answer 2 · 2020-12-10T14:34:30.000Z

Thanks! I'll add that test case and look into it 🙌🏼

Btw is that neovim on your screenshot?

Answer 3 · 2020-12-10T15:45:08.000Z

@Ostera yes. The left hand side is nvim treesitter playground plugin. :)

Answer 4 · 2020-12-10T15:48:22.000Z

Sweet!

What would the expected behavior be here? I presume you'd want a "MISSING" error.

Answer 5 · 2020-12-10T16:09:57.000Z

Sweet!

What would the expected behavior be here? I presume you'd want a "MISSING" error.

It would be like you presume or tokenized correctly (in the test case above, it should be a number, I guess). And it should be able to continue typing not hang up.

Answer 6 · 2020-12-10T16:47:14.000Z

I found that the parser not hang up after comment $.expr_operator out in L313.

yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter parse amod.erl -d
(source_file [0, 0] - [4, 0]
  (module_name [0, 0] - [0, 14]
    (atom [0, 8] - [0, 12]
      value: (unquoted_atom [0, 8] - [0, 12])))
  (ERROR [2, 0] - [3, 3]
    (atom [2, 0] - [2, 9]
      value: (unquoted_atom [2, 0] - [2, 9]))
    (pattern [2, 10] - [2, 11]
      (term [2, 10] - [2, 11]
        (integer [2, 10] - [2, 11])))
    (ERROR [3, 2] - [3, 3])))
amod.erl	0 ms	(ERROR [2, 0] - [3, 3])
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

NOTE: the amod.erl is the example from my screenshot.

Answer 7 · 2020-12-10T18:04:33.000Z

Hm, the way we're using the choice operator may be too expensive / run too many branches?

Could you try this approach where we'd be flattening out the branching (right now its in 2 levels for us)?

so instead of

expr_operator_binary: ($) =>
  choice(
    prec.left(
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", oneOf(OP2_LEFT_ASSOC)),
        field("rhs", $.expression)
      )
    ),
    prec.right(
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", oneOf(OP2_RIGHT_ASSOC)),
        field("rhs", $.expression)
      )
    )
  ),

we'd have something like:

expr_operator_binary: ($) => choice(  
  ...[
  ...OP2_LEFT_ASSOC.map( op => [ "left", op ])
  ...OP2_RIGHT_ASSOC.map( op => [ "right", op ])
  ].map( [dir, op] => 
    prec[dir](
      PREC.BINARY_OP,
      seq(
        field("lhs", $.expression),
        field("operator", op),
        field("rhs", $.expression)
      )
    )
  ),

Answer 8 · 2020-12-11T05:54:40.000Z

It's still not working:

diff --git a/grammar.js b/grammar.js
index b4205ba..5c1d984 100644
--- a/grammar.js
+++ b/grammar.js
@@ -405,20 +405,17 @@ module.exports = grammar({

     expr_operator_binary: ($) =>
       choice(
-        prec.left(
-          PREC.BINARY_OP,
-          seq(
-            field("lhs", $.expression),
-            field("operator", oneOf(OP2_LEFT_ASSOC)),
-            field("rhs", $.expression)
-          )
-        ),
-        prec.right(
-          PREC.BINARY_OP,
-          seq(
-            field("lhs", $.expression),
-            field("operator", oneOf(OP2_RIGHT_ASSOC)),
-            field("rhs", $.expression)
+        ...[
+          ...OP2_LEFT_ASSOC.map((op) => [prec.left, op]),
+          ...OP2_RIGHT_ASSOC.map((op) => [prec.right, op]),
+        ].map(
+          ([_prec, op]) => _prec(
+            PREC.BINARY_OP,
+            seq(
+              field("lhs", $.expression),
+              field("operator", op),
+              field("rhs", $.expression)
+            )
           )
         )
       ),

$ yarn tree-sitter generate && yarn tree-sitter parse amod.erl -d
yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter generate
✨  Done in 0.77s.
yarn run v1.22.10
$ /Users/thanabodee/src/github.com/wingyplus/tree-sitter-erlang/node_modules/.bin/tree-sitter parse amod.erl -d
new_parse
process version:0, version_count:1, state:1, row:1, col:0
lex_internal state:47, row:1, column:0
  consume character:'-'
lexed_lookahead sym:-, size:1
shift state:4
process version:0, version_count:1, state:4, row:1, col:1
lex_internal state:47, row:1, column:1
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'u'
  consume character:'l'
  consume character:'e'
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'u'
  consume character:'l'
  consume character:'e'
lexed_lookahead sym:module, size:6
shift state:830
process version:0, version_count:1, state:830, row:1, col:7
lex_internal state:0, row:1, column:7
  consume character:'('
lexed_lookahead sym:(, size:1
shift state:539
process version:0, version_count:1, state:539, row:1, col:8
lex_internal state:0, row:1, column:8
  consume character:'a'
  consume character:'m'
  consume character:'o'
  consume character:'d'
  consume character:'a'
lexed_lookahead sym:unquoted_atom, size:4
shift state:11
process version:0, version_count:1, state:11, row:1, col:12
lex_internal state:45, row:1, column:12
  consume character:')'
lexed_lookahead sym:), size:1
reduce sym:atom, child_count:1
shift state:790
process version:0, version_count:1, state:790, row:1, col:13
lex_internal state:47, row:1, column:13
  consume character:'.'
lexed_lookahead sym:., size:1
shift state:397
process version:0, version_count:1, state:397, row:1, col:14
lex_internal state:47, row:1, column:14
  skip character:10
  skip character:10
  consume character:'f'
  consume character:'i'
  consume character:'b'
  consume character:'o'
  consume character:'n'
  consume character:'a'
  consume character:'c'
  consume character:'c'
  consume character:'i'
  consume character:'f'
lexed_lookahead sym:unquoted_atom, size:11
reduce sym:module_name, child_count:6
shift state:11
process version:0, version_count:1, state:11, row:3, col:9
lex_internal state:45, row:3, column:9
  consume character:'('
lexed_lookahead sym:(, size:1
reduce sym:atom, child_count:1
shift state:394
process version:0, version_count:1, state:394, row:3, col:10
lex_internal state:47, row:3, column:10
  consume character:'1'
lexed_lookahead sym:integer_token3, size:1
shift state:12
process version:0, version_count:1, state:12, row:3, col:11
lex_internal state:45, row:3, column:11
  consume character:')'
lexed_lookahead sym:), size:1
reduce sym:integer, child_count:1
reduce sym:term, child_count:1
reduce sym:pattern, child_count:1
shift state:807
process version:0, version_count:1, state:807, row:3, col:12
lex_internal state:0, row:3, column:12
  skip character:' '
  consume character:'-'
  consume character:'>'
lexed_lookahead sym:->, size:3
shift state:84
process version:0, version_count:1, state:84, row:3, col:15
lex_internal state:47, row:3, column:15
  skip character:10
  skip character:' '
  skip character:' '
  consume character:'1'
lexed_lookahead sym:integer_token3, size:4
shift state:215
process version:0, version_count:1, state:215, row:4, col:3
lex_internal state:2, row:4, column:3
  skip character:10

Answer 9 · 2020-12-11T08:21:36.000Z

I've narrowed it down to that rule being the one that takes an infinity to parse. If you keep the unary operator expression one things should work fine.

So I'll be reworking that one soon :) thanks for finding this!

Answer 10 · 2020-12-18T11:57:51.000Z

@wingyplus I've done 2 rewrites of the thing and I can't seem to get it to work well.

It seems that supporting both free-standing expressions and a proper module, clashes.

E.g, the Erlang expression language is embedded into the Erlang module language.

So we may need to provide 2 separate grammars.