stencilproject/Stencil

Parsing {{% is incorrectly parsed

ilyapuchka opened this issue · 6 comments

Parsing {{%is parsed as variable node instead of tag resulting in error on parsing corresponding closing tag a sit is not balanced with opening tag.

Example:

class Some {{% if/for/anything else ... % } ... {% end... %} < unknown tag

So this case is caused by the lack of whitespace/newline controls. Adding space in the template will solve it but will leave generated code with trailing spaces, which is extra work for formatter, if any being used. Or it can be solved with moving tag to new line, but this has implications on template format, again due to lack of whitespace/newline control, or introduces unneeded new line.
There might be other cases where having block of code in {} in one line is desirable like generating code for trailing closures.
And in general this is a false negative.

djbe commented

The problem is that the current lexer is a simple, character by character, scanner.

We could add a simple lookahead to see if it can find the end of a token, for example if it finds {{, first see if it can find }} later before returning it as the start of a token. If it can't find the end, then we haven't found the start of a token. That would solve the simple use case that you mentioned, as it can't find the end for {{, so it continues, finds {% with a corresponding end.

But consider the following:

class Some {{% if true %}{{ stuff }}{% endif %}

With the basic lookahead, the scanner finds {{, searches for }} and finds it (after stuff), so it has found a token, namely {{% if true %}{{ stuff }}, which is not really what we want. At this point, we either need some smarter lookahead/behind that keeps track of these, or something like regexes, or ...

Or we can do what Jinja2 does, which is that a user should escape such delimiters (documentation):

class Some {{ '{' }}{% if true %}{{ stuff }}{% endif %}

The good news is that the latest changes in #226 support this.

Yes, we will need to make parser smarter but this can affect performance negatively as we will have to go through some string ranges repetitively.
To address the case that you mention the lookahead should maintain a stack - when new token pushed on top its corresponding ending token will be searched for unless something else is pushed on top of it.
This way can also probably improve some error messages.

Escaping is a good idea actually, it should already work like that 👍

I'll close it for now as escaping seems to be good enough solution

djbe commented

We should maybe document it though? In case someone hits this exact issue.

Yes, I'll update docs.