Allow polymorphic state in Stateful parser

Question

Allow polymorphic state in Stateful parser

dminuoso opened this issue 2 years ago · 4 comments

Right now, Stateful parsers have a monomorphic Int# state, While it is enough to carry basic parser states, such as what parser mode we are in at the moment, it's not enough for when one wants to collect information such as a map of known identifiers for instance, or constructing a result during parsing. In our case it means we have to parse into an intermediate language, and interpret that instead, while with megaparsec we would just build a tree during parsing.

I've done a simple copy of the Stateful parser, making it polymorphic over state.
https://github.com/dminuoso/flatparse/tree/polymorphic-state

Benchmarks suggests that, strangely enough, it is consistently faster than Int# monomorphized state in sexp and basic benchmarks, slightly slower in numeral benchmarks. Probably needs analysis on core for the first two cases.

benchmarked sexp/fpstateful
time                 3.156 ms   (3.150 ms .. 3.167 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.174 ms   (3.169 ms .. 3.180 ms)
std dev              16.21 μs   (13.06 μs .. 20.49 μs)

benchmarked sexp/fppolystateful
time                 2.932 ms   (2.899 ms .. 2.966 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.965 ms   (2.951 ms .. 2.981 ms)
std dev              49.66 μs   (41.52 μs .. 60.77 μs)

benchmarked long keyword/fpstateful
time                 237.6 μs   (227.6 μs .. 246.8 μs)
                     0.994 R²   (0.991 R² .. 0.999 R²)
mean                 230.7 μs   (228.9 μs .. 233.5 μs)
std dev              7.238 μs   (4.717 μs .. 10.35 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked long keyword/fppolystateful
time                 226.9 μs   (225.4 μs .. 229.8 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 226.8 μs   (226.1 μs .. 228.1 μs)
std dev              2.950 μs   (1.461 μs .. 4.728 μs)

benchmarked numeral csv/fpstateful
time                 966.9 μs   (954.9 μs .. 978.4 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 975.5 μs   (968.9 μs .. 980.9 μs)
std dev              20.86 μs   (18.56 μs .. 23.16 μs)

benchmarked numeral csv/fppolystateful
time                 1.035 ms   (1.014 ms .. 1.056 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 986.1 μs   (980.2 μs .. 994.4 μs)
std dev              24.08 μs   (17.61 μs .. 34.36 μs)

What are your thoughts?

P.S. But I am surprised that sexp/fpbasic takes almost 30% more time than the stateful variants. Any insights here?

Answer 1 · 2022-10-19T20:14:47.000Z

Thanks, I'll check out your branch. The performance looks pretty close (perhaps within margin of error), but IIRC the benchmarks don't actually use the threaded state. We should look at some benchmarks where the state is used non-trivially.

With GHC <9.4, the polymorphic state would be significantly worse than the monomorphic one, because those GHC versions are not able to do nested unboxing in return types. So an Int state would generally remain boxed. It'd be worth to confirm though under GHC 9.4. that unboxing works.

I also note that to collect identifiers or information related to the current scope, it is generally not necessary to use the state, because the reader parameter is sufficient for that. In short, if we only have lexical nested scopes (which is the great majority of all programming languages) then scoping information can be stored in the reader. And the reader has the advantage that it can be unboxed reliably under any reasonable GHC version.

Answer 2 · 2022-10-20T08:18:49.000Z

We have a simple solution then. Make it levity polymorphic over the state, and provide SPECIALIZE pragmas for s ~ Int# on every binding that is not inlined. Best of both worlds?

I've pushed a commit to the repo to reflect this idea.

Answer 3 · 2022-10-20T08:23:04.000Z

Hah, it turns out there's apparently a GHC bug. SPECIALIZE does not seem to work with levity polymorphic things. But the amount of non-inlined bindings is minimal as it is.

Answer 4 · 2022-10-20T11:40:35.000Z

Disregard all of that.

The PolyKinds does not kick in the way I thought it would. Turns out you cannot have levity polymorphic binders, which makes a bunch of relevant functions impossible to implement.