Chris martin's seven parser types
Opened this issue · 4 comments
Something to aspire to:
https://twitter.com/chris__martin/status/1545102338360107008?t=MnxHlQ4zc6fzL28jX_IOzg&s=09
I've went ahead and made my in-progress Haskell project public so you can have a look.
A few things that might be worth reading:
- Subtype diagram
- Very abstract type definitions for the seven action types
- Casting operation
- The type-level function for combining two different types of action in sequence (haven't finished writing all the instances yet. Also there are several errors, don't copy from this.)
- One parsing API in progress, where you can see in the type signatures how some of the combinators change from one type of parser to another.
There are actually only two different type definitions, Any and Sure. The other distinctions aren't guaranteed by construction, they're just promises you have to make when you apply the constructors or coerce. This is surprising, because you'd expect e.g. Query to be only a Read operation on the cursor, not a State operation. But this is a streaming library, and so everything (even parsers that are not supposed to move the cursor) still have to be able to modify the state because they may need to buffer more input to produce their result. It might be possible to get really fancy with a more complex model to improve on this situation, but I'm not sure it's worth it. We still provide sufficient safety for the end user of the parsing library, who never uses the constructors directly anyway.
One noteworthy observation that I hadn't realized upfront when I started this is that traditional applicative/monadic style goes right out the window if you introduce this many types, for a few reasons:
- I suspect you'd be constantly casting up to get two parsers to be the same type to be able to
<*>or>>=them (like the experience oflifting monad transformers, but worse) MoveandAtomicMoveare not Applicative, because they don't supportpure.- To make
Atombe Applicative, the sequencing operations would have to implicitly add backtracking, and I'm not sure that's a great idea.
To get back the convenience of monadic parsing, I defined my own >>=, <*>, etc. operators that make use of the KindJoin type family.
type family KindJoin (k1 :: ActionKind) (k2 :: ActionKind) :: ActionKind
type a :> b = KindJoin a b -- its infix alias is convenient sometimesMy parser API then has a handful of functions like this that replace the standard Applicative and Monad ones:
(>>=) ::
Monad m => IsAction k1 => IsAction k2 => IsAction k3 => ActionJoin k1 k2 => k1 :> k2 ~ k3 =>
Parser text k1 m a -> (a -> Parser text k2 m b) -> Parser text k3 m bThis isn't so bad if we make use of the QualifiedDo extension introduced in GHC 9.0, which allows the built-in do syntax to use our custom (>>=) et al. operators instead of the standard functions. (Does purescript have something like this?) You can see it in use in the test spec
Anyway, caveats aside, I've been obsessed with this for a week now, a good bit of it is working, and I think it shows promise 😄
Oof, I wrote a lot. Going to copy some of this text into discussions on my own repo.
Thanks for the writeup @chris-martin !
(Purescript does have qualified do).
Update - I've still been noodling on this library, and seven is down to five.
I've removed the parser types that include a movement guarantee. I had no way to codify this property in the type definitions, and so it relied entirely on unsafe casts to indicate that I know that a parser will always advance if it succeeds. I had been hoping that just a few primitive parsers would need unsafe casts, and that the information would end up getting propagated upward through the KindJoin family, but this turned out not to be the case very often, and so it has just required inserting a bunch of unsafe casts everywhere to assert that parsers have guaranteed movement. And all for dubious benefit; it seems like the "many" combinator is essentially the only place where movement guarantee is needed.
We are left with...
Four core parser types:
Sure- read/writeSureQuery- read-onlyAny- read/write, returns EitherQuery- read-only, returns Either
And the composition:
Atom- a Query that returns a Sure
Nice.