AndrasKovacs/flatparse

BasicLambda int does not parse integers

chrisdone opened this issue · 2 comments

The example on master doesn't parse integers:

> :m + FlatParse.Examples.BasicLambda.Parser FlatParse.Basic
> runParser src' "123"
Err (Imprecise (Pos 3) [Msg "identifier",Lit "true",Lit "false",Msg "parenthesized expression",Msg "integer literal",Lit "let",Lit "lam",Lit "if"])
> runParser int "123"
Fail

I think the parser is too greedy:

int :: Parser Int
int = token $
  snd <$> chainr (\n (!place, !acc) -> (place*10,acc+place*n)) digit ((10,) <$> digit)
chainr :: (a -> b -> b) -> Parser e a -> Parser e b -> Parser e b
chainr f (Parser elem) (Parser end) = go where
  go = Parser \fp eob s -> case elem fp eob s of
    OK# a s -> case runParser# go fp eob s of
      OK# b s -> let !b' = f a b in OK# b' s
      x       -> x
    Fail# -> end fp eob s
    Err# e -> Err# e
{-# inline chainr #-}

chainr consumes elem (digit in this case) greedily until it Fails and then it tries to consume end, which of course fails because there are no more integers to parse. So either you have to make the end a no-op and use a lookahead to ensure there is at least one digit to consume, or perhaps pick another combinator over chainr. Maybe you could have chainr1 that would omit an end parser.

Gross but functional example:

int :: F.Parser ParseError Int
int = do
  _ <- F.lookahead digit
  i <- snd <$>
   (F.chainr
      (\n (!place, !acc) -> (place * 10, acc + place * n))
      digit
      (pure (1,0)))
  ws
  pure i
> F.runParser int ""
Fail
> F.runParser int "00012"
OK 12 ""

You're right. I'm inclined to go with the following now:

int :: Parser Int
int = token do
  (place, n) <- chainr (\n (!place, !acc) -> (place*10,acc+place*n)) digit (pure (1, 0))
  case place of
    1 -> empty
    _ -> pure n

My general idea in the API is that list-returning functions should be de-emphasized in favor of fusable/inlinable combinators. I'm not exactly sure about the right set of such combinators. The above int solution still looks kind of stiff.

To digress a bit, every solution we can do now has awful performance because of the Int boxing, and ideally we'd have a levity polymorphic API for at least some basic methods and combinators. I've made an issue about this: #5

I prefer your version indeed. It measures the same performance as mine, so I'll use yours.

(No change to these numbers)

benchmarking parseText/Parser2/Text/array[1000]
time                 43.24 μs   (43.02 μs .. 43.48 μs)
                     0.997 R²   (0.993 R² .. 1.000 R²)
mean                 44.10 μs   (43.07 μs .. 46.31 μs)
std dev              5.031 μs   (2.197 μs .. 9.385 μs)
variance introduced by outliers: 87% (severely inflated)

benchmarking parseText/Parser2/Text/array[2000]
time                 103.5 μs   (103.2 μs .. 103.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 103.8 μs   (103.5 μs .. 104.4 μs)
std dev              1.493 μs   (1.250 μs .. 1.896 μs)

benchmarking parseText/Parser2/Text/array[4000]
time                 262.6 μs   (261.5 μs .. 264.1 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 262.7 μs   (261.7 μs .. 263.8 μs)
std dev              3.554 μs   (2.634 μs .. 4.579 μs)

My general idea in the API is that list-returning functions should be de-emphasized in favor of fusable/inlinable combinators. I'm not exactly sure about the right set of such combinators. The above int solution still looks kind of stiff.

This makes sense; I also don't know a good API for e.g. parsing into a vector.

Thanks!