lark-parser/lark_cython

Meta is always empty while using cython lark

ernestoarbitrio opened this issue · 9 comments

This is my parser:

    lark_kwargs = {
        "parser": "lalr",
        "lexer": "contextual",
        "propagate_positions": True,
        "_plugins": lark_cython.plugins,
    }
    lang_parser = Lark(lang_def, start="start", **lark_kwargs)

I'd like to propagate the line positions, without cython-lark it works fine, but once i enable the plugins the meta value is always empty.

Do you have any suggestion on what's wrong ?

Bug confirmed.

This is happening because lark-cython still uses Lark's parse-tree-builder, which in turn uses isinstance(..., Token), which only matches Lark Tokens, instead of lark-cython Tokens.

See this line: https://github.com/lark-parser/lark/blob/master/lark/parse_tree_builder.py#L74

The following change works:

            elif hasattr(c, 'line'): #isinstance(c, Token):
                return c

But that would break backwards compatibility.

And unfortunately lark-cython cannot inherit from lark.Token.

So I'm not sure what's the best way to solve this. Another option is to update lark-cython to use its own parse-tree-builder.

Thanks for letting me know! Sorry but i do not understand what do you mean for

Another option is to update lark-cython to use its own parse-tree-builder.

Could you explain me?

Right now lark-cython uses Lark's regular ParseTreeBuilder class. If that class is copied and adapted into lark-cython, that won't be a problem anymore. (and if done correctly, should be a little faster too)

oh gotcha ... do you wanna me try to do this?

It's not so simple, it also requires changes in the Lark codebase, to allow it as a "plugin" argument.

Probably the easiest solution would be to allow a new optional attribute to mark an object as a 'meta', like __lark_meta__ = True. Then it only requires minimal changes to Lark, and only to the PropagatePositions class.

You might need to add this attribute to both Token and Tree.

yes I do not have enough experience on lark. Even the last solution probably requires more knowledge. I would be happy to help but I think I need help as well :D

Okay, so I just created a PR for Lark, have a look: lark-parser/lark#1203

With this change, all you need is to add this to lark_cython.Token

    def __lark_meta__(self):
        return self

And for Tree return self.meta.

Then add some tests to lark-cython, and that's it. Makes sense?

Okay, so I just created a PR for Lark, have a look: lark-parser/lark#1203

With this change, all you need is to add this to lark_cython.Token

    def __lark_meta__(self):
        return self

And for Tree return self.meta.

Then add some tests to lark-cython, and that's it. Makes sense?

yes I'll try. thx