DanielOliver/FLexer

Curious about performance vs fsyacc/lexx

Closed this issue · 4 comments

Hi, Im just curious about the relative performance vs fsyacc/lexx and maybe the error reporting capability comparison?

Performance comparison is an excellent question that I don't have an answer for immediately.

Error reporting is a little sparse right now beyond seeing how far the tokens could be read. The current example from the README here illustrates the lack of information.

// Rejected "SELECT Column1, Column2,,,NoColumn FROM Contacts"
// 
// LookaheadFailure
// 
// --  Consumed Tokens  ----------------------------------------------------------------
//  StartChar  |     EndChar  |                  Text  |                  Classification
// -------------------------------------------------------------------------------------
//          0  |           5  |                SELECT  |  Select

I have some features regarding better error reporting in development, as well as making it easier to write more performant code. I can't say that FLexer will ever be always super fast, but it should be easy to get right.

FLexer is ultimately a recursive descent parser with infinite backtrack (a depth first search). With this approach, I'm putting the responsibility of being optimal on the developer. So, the happy path will be "right first try"; however, the potential to backtrack to the worst case still exists.

I can't promise anything on performance comparisons, but I'll ping you back here when I get better error reporting added in a few weeks.

I've just made a change so that the errors with the longest parse length are returned. It's not amazing error reporting, but it's a step in the right direction.

Every Failure to parse returns the structure below.

type ClassifierError<'t> =
{ LastStatus: ClassifierStatus<'t>
TokenizerError: Tokenizer.TokenizerError option
}

The LastStatus field contains the information below. The TokenizerError field isn't used very well right now and is inconsistent, so don't rely on that.

type ClassifierStatus<'t> =
{ Consumed: Tokenizer.Token<'t> list
ConsumedWords: string list
CurrentChar: int
Remainder: string
}

With the longest error being returned, by printing out the ConsumedWords list as shown in the FLexer.Example project, I can see that the Parser got this far:

  ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***
-------------------------------------------------------------------------------------
Rejected "SELECT Column1, Column2,,,NoColumn FROM Contacts"

LookaheadFailure

--  Consumed Text  ------------------------------------------------------------------
                          Text     Length
-------------------------------------------------------------------------------------
                        SELECT  |           6
                  (Whitespace)  |           1
                       Column1  |           7
                             ,  |           1
                  (Whitespace)  |           1
                       Column2  |           7
                             ,  |           1

Ideally, FLexer should be able to support custom error messages being returned from any point, but I haven't quite been able to make that happen yet.