DmitrySoshnikov/syntax

rust plugin: wrong type annotation for args

namiwang opened this issue · 5 comments

Hi, I'm trying to implement an experimental parser with custom tokenizer, in rust, via syntax-cli.

Here's a simple grammar (partially ripped from ruby's)

     0. $accept -> program
    -----------------------
     1. program -> top_compstmt
     2. top_compstmt -> top_stmts opt_terms
     3. top_stmts -> top_stmt
     4. top_stmt -> stmt
     5. stmt -> expr
     6. expr -> arg
     7. arg -> primary
     8. primary -> literal
     9. literal -> numeric
    10. numeric -> simple_numeric
    11. simple_numeric -> tINTEGER
    12. opt_terms -> terms
    13. term -> tNL
    14. terms -> term

┌────────────────┬───────────┐
│ Symbol         │ First set │
├────────────────┼───────────┤
│ $accept        │ tINTEGER  │
├────────────────┼───────────┤
│ program        │ tINTEGER  │
├────────────────┼───────────┤
│ top_compstmt   │ tINTEGER  │
├────────────────┼───────────┤
│ top_stmts      │ tINTEGER  │
├────────────────┼───────────┤
│ top_stmt       │ tINTEGER  │
├────────────────┼───────────┤
│ stmt           │ tINTEGER  │
├────────────────┼───────────┤
│ expr           │ tINTEGER  │
├────────────────┼───────────┤
│ arg            │ tINTEGER  │
├────────────────┼───────────┤
│ primary        │ tINTEGER  │
├────────────────┼───────────┤
│ literal        │ tINTEGER  │
├────────────────┼───────────┤
│ numeric        │ tINTEGER  │
├────────────────┼───────────┤
│ simple_numeric │ tINTEGER  │
├────────────────┼───────────┤
│ tINTEGER       │ tINTEGER  │
├────────────────┼───────────┤
│ opt_terms      │ tNL       │
├────────────────┼───────────┤
│ terms          │ tNL       │
├────────────────┼───────────┤
│ term           │ tNL       │
├────────────────┼───────────┤
│ tNL            │ tNL       │
└────────────────┴───────────┘

And some productions:

...

top_compstmt
    : top_stmts opt_terms {
        |$1: Node; $2: Token| -> Node;

        $$ = Node::Dummy;
    }
;

...

Would produce handlers like:

enum SV {
    Undefined,
    _0(Token),
    _1(Node)
}

...

fn _handler2(&mut self) -> SV {
// Semantic values prologue.
let mut _1 = pop!(self.values_stack, _1);
let mut _2 = pop!(self.values_stack, _0);

        let __ = Node::Dummy;
SV::_1(__)
}

...

The issue I encountered is, at the beginning of top_compstmt aka _handler2, the values stack is like:

[
    _1(Dummy),
    _0(Token { kind: 15, value: "\n", ... })
]

It seems legit to me, the first is the reduced result value for top_stmt <- ... <- tINTEGER and the second one is the result of opt_term <- ... <- tNL.

Then the statement let mut _1 = pop!(self.values_stack, _1); is assuming a _1(Node) is on the top of the stack, meanwhile the reality is the top of the stack is _0(Token), thus the issue.

So do you think this is an issue in syntax-cli or somewhere else in my implementation? Thanks.

@namiwang, thanks for reporting, Rust plugin is currently experimental, and might have potential bugs. If you could attach some isolate example of a smaller grammar with tokenizer rules, which shows the issue, it'll be easier to debug it.

@DmitrySoshnikov

Hi! Just coined a smaller demo, which is altered from the example/calc-ast.rs.g.

%lex

%%

\s+         /* skip whitespace */ return "";

";"         return "SEMI";

\d+         return "NUMBER";

"+"         return "+";
"*"         return "*";

"("         return "(";
")"         return ")";

/lex

%left +
%left *

%{

#[derive(Debug)]
pub enum Node {

    Literal(i32),

    Binary {
        op: &'static str,
        left: Box<Node>,
        right: Box<Node>,
    },

    Stmt {
        expr: Box<Node>
    }
}



%}

%%

Stmt
    : Expr Terminator {
        |$1: Node; $2: Token| -> Node;

        $$ = Node::Stmt {
            expr: Box::new($1)
        }
    }
;

Terminator
    : SEMI
;

Expr
    : Expr + Expr {

        // Types of used args ($1, $2, ...), and return type:
        |$1: Node; $3: Node| -> Node;

        $$ = Node::Binary {
            op: "+",
            left: Box::new($1),
            right: Box::new($3),
        }
    }

    | ( Expr ) {
        $$ = $2;
    }

    | NUMBER {
        || -> Node;
        let n = yytext.parse::<i32>().unwrap();
        $$ = Node::Literal(n);
    };

And the log

thread 'parser' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: <std::panicking::begin_panic::PanicPayload<A> as core::panic::BoxMeUp>::get
             at libstd/panicking.rs:463
   5: std::panicking::try::do_call
             at libstd/panicking.rs:350
   6: std::panicking::try::do_call
             at libstd/panicking.rs:328
   7: core::ptr::drop_in_place
             at libcore/panicking.rs:71
   8: core::ptr::drop_in_place
             at libcore/panicking.rs:51
   9: <std::collections::hash::map::RandomState as core::hash::BuildHasher>::build_hasher
             at /Users/travis/build/rust-lang/rust/src/libcore/macros.rs:20
  10: dummy::Tokenizer::to_token
             at src/lib.rs:447
  11: dummy::Tokenizer::get_next_token
             at src/lib.rs:363
  12: dummy::Tokenizer::get_next_token
             at src/lib.rs:360
  13: dummy::Tokenizer::_lex_rule6
             at src/lib.rs:632
  14: parser::parser
             at tests/parser.rs:9
  15: parser::__test::TESTS::{{closure}}
             at tests/parser.rs:6
  16: core::ops::function::FnOnce::call_once
             at /Users/travis/build/rust-lang/rust/src/libcore/ops/function.rs:223
  17: <F as alloc::boxed::FnBox<A>>::call_box
             at libtest/lib.rs:1451
             at /Users/travis/build/rust-lang/rust/src/libcore/ops/function.rs:223
             at /Users/travis/build/rust-lang/rust/src/liballoc/boxed.rs:638
  18: panic_unwind::dwarf::eh::read_encoded_pointer
             at libpanic_unwind/lib.rs:105
test parser ... FAILED

OK, I think the issue is in the wrong pop order. In particular in your example, the generated handler:

fn _handler1(&mut self) -> SV {
// Semantic values prologue.
let mut _1 = pop!(self.values_stack, _1);
let mut _2 = pop!(self.values_stack, _0);

let __ = Node::Stmt {
            expr: Box::new(_1)
        };
SV::_1(__)
}

Should first pop the token. A quick fix to test, change in the generated file the:

fn _handler1(&mut self) -> SV {
// Semantic values prologue.
let mut _2 = pop!(self.values_stack, _0);
let mut _1 = pop!(self.values_stack, _1);

let __ = Node::Stmt {
            expr: Box::new(_1)
        };
SV::_1(__)
}

I'll take a look later on into this, should be a simple fix, and will appreciated a PR as well in case. Thanks for catching this up!

OK, fixed in 7153b3a, and in the v. 0.1.2.

Please let me know if you see any issues!

Thanks!