Dan-wanna-M/kbnf

Add lexer

Closed this issue · 1 comments

It is possible to add a lexer stage to process bytes so we can deal with real tokens in Earley recognizer.

Pros:

  • Faster since DFA is 5-10 times faster than Earley recognizer.
  • Make possible parsing cleaner

Cons:

  • More complex and further puzzles the user
  • Lexer may not be fully regular, which means we still fall back to some kinds of CFG
  • We may gain enough speed by eager regex caching.

eager regex cache is fast enough. In fact currently mask_logits is the slowest thing.