aaditmshah/lexer

How to look for the next token?

btwael opened this issue · 3 comments

I am using your lexer with JISON, and I want, into a rule, to look what is the next token, some thing like this:

lexer.addRule(/a regex for identifier/, function(lexeme) {
    if(this.look() == 'IDENTIFIER') {
        return 'TYPE'
    } else {
        return 'IDENTIFIER'
    }
});

Sorry for my english!

I am trying solve this feature, you can find my solution at #10

You don't really need a separate look function. You can just use the lex function to get the next token. For example, this is how I would tackle your problem:

var lexer = new Lexer;

lexer.addRule(/\s+/, function () {
    // skip whitespace
});

lexer.addRule(/[a-z]+/i, function (lexeme) {
    this.yytext = lexeme;
    return "IDENTIFIER";
});

alert(parse("main"));
alert(parse("int main"));

function tokenize(input) {
    lexer.setInput(input);

    var tokens = [], token;

    while (token = lexer.lex()) {
        if (token === "IDENTIFIER") {
            var yytext = lexer.yytext, next = lexer.lex();

            if (next && next === "IDENTIFIER") {
                tokens.push("[TYPE " + yytext + "]");
                token = next;
            }
        }

        tokens.push("[" + token + " " + lexer.yytext + "]");
    }

    return tokens.join(" ");
}

See the demo: http://jsfiddle.net/v0s70Lww/

As you can see the actions for each rule just return a token without any additional processing. The main logic for differentiating between IDENTIFIER and TYPE tokens based on context is in the tokenize function.

This is important. If we used either look or lex within the action rule we might run into a recursive loop which might lead to unexpected results. Hence it's always better to do all the processing inside the while loop in the tokenize function.

The point is that you don't really need a look function. With the right abstraction you could just write all your processing logic within the tokenize function.

Thank you @aaditmshah, I like your solution, it's the perfect method for a hand-written project (I will save it in my snippets library to be used in my next projects), but now because I am using a parser-generator, maybe I will use look that work for me now!
We have now two possible answer (solution) so I will close this issue if you don't mind.
Thank you!