skvadrik/re2c

EOF rules with multiple blocks

Closed this issue · 9 comments

brk commented

When using multiple blocks within a single function, the EOF rules in each block will (re-)define the same yyeof symbol. (This behavior applies to both 1.3 and HEAD). The resulting code cannot be compiled.

I took a quick look at the code in emit_action.cc but didn't see an obvious way of obtaining a block identifier. Hopefully this isn't a difficult patch to produce for someone more familiar with the codebase!

trofi commented

Can you provide simple example of source .re code that exhibits bad code generation?

I've come up with this example (for simplicity it doesn't do buffer refilling and YYFILL always fails, but that's irrelevant to the problem):

#include <assert.h>
#include <string.h>

static int lex(const char *str, unsigned int &count)
{
    const char *YYCURSOR = str;
    const char *YYLIMIT  = YYCURSOR + strlen(str);
    count = 0;
    /*!re2c
        re2c:define:YYCTYPE = char;
        re2c:define:YYFILL = "false"; // always failed
        re2c:define:YYFILL:naked = 1;
        re2c:eof = 0;

        wsp   = [ \n]+;
        char1 = [a-zA-Z_];
        char  = char1 | [0-9];
        word  = char1 char+;
    */
loop:
    /*!re2c
        *          { return 1; }
        $          { return 0; }
        "" / char1 { goto word; }
        wsp        { goto loop; }
    */
word:
    /*!re2c
        *    { return 2; }
        $    { return 0; }
        word { ++count; goto loop; }
    */
}

int main()
{
    unsigned int count;
    assert(lex("aa bb cc", count) == 0 && count == 3);
    assert(lex("aa 1b cc", count) == 1);
    assert(lex("aa b cc", count) == 2);
    return 0;
}

To be run as:

re2c -W example.re -oexample.cpp && g++ -Wall example.cpp -oexample && ./example

Fix: 1503a74.

@brk

I took a quick look at the code in emit_action.cc but didn't see an obvious way of obtaining a block identifier.

Re2c doesn't yet have a concept of named blocks, so I added "block ID" to the label. The number does not represent re2c block, just some inner program block, so it is quite meaningless. But EOF labels are not part of the stable API exposed to the users, so re2c will generate more beautiful labels in future.

Also, as @trofi pointed out, it is good to provide an example (this time it was easy for me to construct one, but user examples are more versatile and make better tests).

brk commented

I apologize for not providing a reduced testcase from the start.

Thank you very much for the rapid fix!

Hey, I am still experiencing this issue (using re2c compiled at HEAD), but with a slightly more complicated example, involving the use of conditions and the EOF rule:

#include <assert.h>
#include <string.h>


/*!types:re2c */

int lex(const char* cur)
{
    const char* lim;
    const char* mrk;

    char yych;

    int condition = yycinit;

    lim = cur + strlen(cur);
loop:
    mrk = cur;

    /*!re2c
        re2c:define:YYCTYPE = char;
        re2c:define:YYLIMIT  = lim;
        re2c:define:YYCURSOR = cur;
        re2c:define:YYMARKER = mrk;
        re2c:variable:yych  = yych;
        re2c:yych:emit = 0;
        re2c:define:YYGETCONDITION = 'condition';
        re2c:define:YYGETCONDITION:naked = 1;
        re2c:define:YYSETCONDITION = 'condition = @@;';
        re2c:define:YYSETCONDITION:naked = 1;
        re2c:yyfill:enable = 0;
        re2c:eof = 0;

        <init> *                                        { return -1; }
        <init> $                                        { return  0; }

        <init> [/][/] .*                                { goto loop; }
        <init> [/][*]                   :=> comment

        <init> [ \t]+                                   { goto loop; }

        <init, comment> [\n]+                           { goto loop; }

        <comment> [^*\n]+ [*]*          :=> comment
        <comment> [^*\n]* [*]+          :=> comment
        <comment> [^*\n]* [*]+ [/]       => init        { goto loop; }

        <comment> *                     :=> comment
        <comment> $                                     { return -1; }
    */
}


int main(void)
{
    assert(!lex("/* hello, */ // world !"));

    return 0;
}

Compile:

re2c -W -Werror -c -o bug.c bug.re && gcc -o bug bug.c

Are you talking about duplicate yyeof1 labels that are causing compilation error?

$ re2c -W -Werror -c -o bug.c bug.re && gcc -o bug bug.c
bug.re: In function ‘lex’:
bug.c:155:1: error: duplicate label ‘yyeof1’
  155 | yyeof1:
      | ^~~~~~
bug.c:98:1: note: previous definition of ‘yyeof1’ was here
   98 | yyeof1:
      | ^~~~~~

I fixed this: 5033918. Thanks for reporting.

Note that this is a different issue from the bug discussed above.

Wow, that was fast! And yes, the duplicate yyeof1 labels was the bug I was referring to.

Thank you!

Note that this is a different issue from the bug discussed above.

@jinscoe123 Please ignore my comment, I myself got confused and thought this was a different bug. :)

No worries :)