Add lexer

Question

Closed this issue 2 months ago · 1 comments

It is possible to add a lexer stage to process bytes so we can deal with real tokens in Earley recognizer.

Pros:

Cons:

More complex and further puzzles the user
Lexer may not be fully regular, which means we still fall back to some kinds of CFG
We may gain enough speed by eager regex caching.

Answer 1 · 2024-08-17T18:49:25.000Z

eager regex cache is fast enough. In fact currently mask_logits is the slowest thing.