purescript-contrib/purescript-parsing

Combinator variations

jamesdbrock opened this issue · 12 comments

We need to figure out which combinators to keep and which are redundant after the CPS refactor #154

Here is what the benchmarks from #176 (comment) show.

  • List.manyRec is faster than List.many. Let's export a Combinators.many = List.manyRec.
  • sepByRec is faster than sepBy. sepBy is not stack-safe. Let's replace sepBy with sepByRec.
  • chainlRec is faster than chainl. chainl is not stack-safe. Let's replace chainl with chainlRec.
  • chainr is faster than chainrRec. chainr is not stack-safe. Not clear what to do here.
  • manyTillRec is faster than manyTill. manyTill is not stack-safe. Let's replace manyTill with manyTillRec.

Also, why are some of the combinators still blowing the stack?

Ideally the non Rec versions should still be stack safe. I don’t think we should replace them without understanding why they blow the stack.

I can't replicate the stack issues with sepBy. When I run this program:

module Main where

import Prelude

import Data.Array as Array
import Data.List (List)
import Data.String (joinWith)
import Effect (Effect)
import Effect.Class.Console (logShow)
import Parsing (Parser, runParser)
import Parsing.Combinators (sepBy)
import Parsing.String (string)

testString :: String
testString = joinWith " " $ Array.replicate 100000 ("A")

testParser :: Parser String (List String)
testParser = sepBy (string "A") (string " ")

main :: Effect Unit
main = do
  let res = runParser testString testParser
  logShow res

I get the full result printed. My guess would be that there is a stack issue elsewhere, not necessarily in the actual execution of the parser.

Oh, interesting! I think it's because the parser actually fails to parse. You have "23" repeated, and are using sepBy, which will result in a parse error. Your original example is blowing the stack in the throw branch of alt. We need to bounce the throw continuation.

@natefaubion fixed the stack safety of alt and bind in #178

I fixed the parse errors that you noticed, here are the new benchmarks: #179 (comment)

  • List.manyRec is faster than List.many. Let's export a Combinators.many = List.manyRec.
  • sepByRec and sepBy appear to be about the same speed. Let's delete sepByRec?
  • chainlRec is faster than chainl. Let's replace chainl with chainlRec?
  • chainr is faster than chainrRec. Let's delete chainrRec?
  • manyTillRec is faster than manyTill. Let's replace manyTill with manyTillRec?

Or maybe we should try to do #155 first before we make these decisions.

I fixed the parse errors that you noticed, here are the new benchmarks: #180 (comment)

  • List.manyRec is faster than List.many. Let's export a Combinators.many = List.manyRec.
  • sepByRec is faster then sepBy. Let's replace sepBy with sepByRec?
  • chainlRec is faster than chainl. Let's replace chainl with chainlRec?
  • chainr is faster than chainrRec. Let's delete chainrRec?
  • manyTillRec is faster than manyTill. Let's replace manyTill with manyTillRec?

Or maybe we should try to do #155 first before we make these decisions.

I think what I'm seeing here #181 (comment)
is that

  • The way to write the combinators so that they will run fastest is by writing the combinators in terms of tailRecM, which is how sepEndBy1Rec is written.
  • The second fastest way is to write them in terms of List.manyRec, which is how sepByRec is written. (List.manyRec is written in terms of tailRecM).
  • It's possible that we could squeeze even more speed out by writing these combinators CPS-style as I think @natefaubion suggested in #155 ? But if we did that then we would still be using tailRecM, I think?

I still don't understand why chairr is faster than chainrRec #181 (comment)

<|> pure (Done $ foldl apply last init)

chainrRec is probably slower because the foldl is being strictly computed on every iteration, making it quadratic. It needs to be defered, or guarded in some way.