Curious about performance vs fsyacc/lexx

Hi, Im just curious about the relative performance vs fsyacc/lexx and maybe the error reporting capability comparison?

Performance comparison is an excellent question that I don't have an answer for immediately.

Error reporting is a little sparse right now beyond seeing how far the tokens could be read. The current example from the README here illustrates the lack of information.

// Rejected "SELECT Column1, Column2,,,NoColumn FROM Contacts"
// 
// LookaheadFailure
// 
// --  Consumed Tokens  ----------------------------------------------------------------
//  StartChar  |     EndChar  |                  Text  |                  Classification
// -------------------------------------------------------------------------------------
//          0  |           5  |                SELECT  |  Select

I have some features regarding better error reporting in development, as well as making it easier to write more performant code. I can't say that FLexer will ever be always super fast, but it should be easy to get right.

FLexer is ultimately a recursive descent parser with infinite backtrack (a depth first search). With this approach, I'm putting the responsibility of being optimal on the developer. So, the happy path will be "right first try"; however, the potential to backtrack to the worst case still exists.

I can't promise anything on performance comparisons, but I'll ping you back here when I get better error reporting added in a few weeks.

Cool thanks!

…

On Mon, 5 Feb 2018 at 16:55, Daniel ***@***.***> wrote: I can't promise anything on performance comparisons, but I'll ping you back here when I get better error reporting added in a few weeks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAj7yj7KXx-m431ktmpual7Eqm3vXDkyks5tRzKIgaJpZM4R0e-Y> .

I've just made a change so that the errors with the longest parse length are returned. It's not amazing error reporting, but it's a step in the right direction.

Every Failure to parse returns the structure below.

FLexer/src/FLexer.Core/Classifier.fs

Lines 28 to 31 in 9d321d6

    
           type ClassifierError<'t> = 
        
               {   LastStatus: ClassifierStatus<'t> 
        
                   TokenizerError: Tokenizer.TokenizerError option 
        
               }

The LastStatus field contains the information below. The TokenizerError field isn't used very well right now and is inconsistent, so don't rely on that.

FLexer/src/FLexer.Core/Classifier.fs

Lines 4 to 9 in 9d321d6

    
           type ClassifierStatus<'t> = 
        
               {   Consumed: Tokenizer.Token<'t> list 
        
                   ConsumedWords: string list 
        
                   CurrentChar: int 
        
                   Remainder: string 
        
               }

With the longest error being returned, by printing out the ConsumedWords list as shown in the FLexer.Example project, I can see that the Parser got this far:

  ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***   ***
-------------------------------------------------------------------------------------
Rejected "SELECT Column1, Column2,,,NoColumn FROM Contacts"

LookaheadFailure

--  Consumed Text  ------------------------------------------------------------------
                          Text     Length
-------------------------------------------------------------------------------------
                        SELECT  |           6
                  (Whitespace)  |           1
                       Column1  |           7
                             ,  |           1
                  (Whitespace)  |           1
                       Column2  |           7
                             ,  |           1

Ideally, FLexer should be able to support custom error messages being returned from any point, but I haven't quite been able to make that happen yet.

	type ClassifierError<'t> =
	{ LastStatus: ClassifierStatus<'t>
	TokenizerError: Tokenizer.TokenizerError option
	}

	type ClassifierStatus<'t> =
	{ Consumed: Tokenizer.Token<'t> list
	ConsumedWords: string list
	CurrentChar: int
	Remainder: string
	}