Resumable parsing using continuation primitives
AndrasKovacs opened this issue · 9 comments
So far, resumable parsers were really only feasible with CPS-based internals, like in Attoparsec. Unfortunately, that has painful overhead because it effectively moves the control stack of the parsers to the heap, and continuation closure calls are also slower than native GHC returns.
GHC 9.6.x is now available in release candidates, and it has new primops for delimited continuations: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0313-delimited-continuation-primops.rst It might be interesting to investigate the new primitives. The idea would be to save the current parser continuation when running out of input, instead of throwing an error.
On further consideration, I don't think that Attoparsec-style resumption is useful enough. All it gets us is amortization of the cost of parsing itself, but it gives us no usable incremental results. For example, when we get parsing input in a bunch of chunks, we can immediately parse a chunk when it arrives, but we get no output at all until all chunks are parsed. All we get is potential latency improvement compared to just buffering bytestring chunks and parsing them in one go, but I find it unlikely that flatparse
would be a latency bottleneck in practice.
I've just completed writing a JSON parser from scratch (GaloisInc/json#17) and I used attoparsec
specifically because flatparse
only works over strict ByteString
s.
we get no output at all until all chunks are parsed
Only in the most basic implementation. If you wrap the results into a Stream, you can make a Parser (a, Parser (a, Parser ...))
chain of arbitrary depth. It looks a tad crude because Parser
needs be run anew every time, but it works well and is composable.
All we get is potential latency improvement...
Yes, since we can't know that the input is correct until we've traversed all of it, the only benefit of a chunked parser is the potential for lower memory and latency overheads. A strict parser will always be more performant on small inputs, but it will become prohibitively expensive once we're into hundreds of megabytes of input data.
Also I don't think a chunked parser can share the foundation with the strict one. From my understanding there are two extra operations I would want:
- Input resupply. Causes the parser to return a
Partial (ByteString -> Parser)
. I assume embedding such an option into a non-CPS parser is exactly what delimited continuations were added for; - Lazy backtracking. Once asked for (via
(<|>)
orlookahead
), the parser would keep track of all previous chunks, discarding references to them once safe to do so. The main benefit here is not having to reallocate the input data ever and this is somethingattoparsec
can't do because it's internal input representation is strict.
Can you tell me more about your use cases? If we're talking about hundreds of MBs of JSON, and the output does fit reasonably into memory, it should be parseable in at most a few seconds in flatparse
. 200 MB/s would be a reasonable lower estimate for going from JSON to uncompressed AST. If the output does not fit to memory, we could do compressed/succinct output or lazy output but that still does not necessary require input resupply.
Ofc supporting input resupply is better than not supporting it and there are use cases for it. Since my last comment here I thought a bit more about the implementation, and one question is: do we want "one-shot" or "multi-shot" resumption (borrowing effect handler terminology)? In the former case a suspended parser can be resumed only once, because resumption is destructive.
The former case can be implemented a lot more efficiently, simply using a new GHC thread for the parser which blocks on an MVar when it runs out of input.
The latter requires making a copy of the parser stack whenever we run out of input, so it requires continuation
primops. It's more expensive and only supported in GHC 9.6+.
Do you see any plausible use case for multi-shot resumption? What it allows is essentially branching parsing with multiple different supplied inputs. I've never seen applications for this to be honest.
Lazy backtracking. Once asked for (via (<|>) or lookahead), the parser would keep track of all previous chunks, discarding references to them once safe to do so.
I don't quite understand this point. I think any chunked/resumable parser would keep track of required chunks just by virtue of them being live data, and not being GC-d. On a backtracking (<|>)
the chunk where we backtrack to is just saved to the program stack.
Can you tell me more about your use cases?
The only real one I know of that assumes chunking by the default is wai
's getRequestBodyChunk
. Content-Length
header is entirely optional, so this isn't as optimal as, say, reading a file from disk.
any chunked/resumable parser would keep track of required chunks just by virtue of them being live data
Two distinct problems here:
- A function like
getRequestBodyChunk
cannot be replayed, so in presence of backtracking we also need to keep track of what the last discovered chunk is; - I assume, perhaps naively, that a chunked parser only operates over a single chunk at a time, so when backtracking there will be extra chunks between the backtracking start and the known end. While the position is a part of the local context, I don't know if these extra chunks count.
"one-shot" or "multi-shot" resumption
I'm out of my depth here, but after a bit of reading on this I think this question is just "what if we take attoparsec
's Partial
and run it twice". I've never tried it in attoparsec
and I assume it's unsafe there because underlying input representation is not pure.
Input chunks, in this specific case, are always constant once discovered, so supporting an operation like this is entirely optional. This sounds "one-shot", but I don't know if running the parser against an MVar
is the best solution here. The only argument I have against this is that the API is gonna be less "Haskelly".
It's more expensive and only supported in GHC 9.6+.
My implementation already requires text-2.0
or higher, that's effectively GHC 9.4+. The GHC status page says anything older than three most recent major versions is outdated and that's roughly two full years of time.
I can't say anything on the performance front because my main gripe with attoparsec
isn't it's speed, it's the fact that it does extra stuff because it's internal input representation is generic.
Now that I look at the list of restrictions flatparse
comes with, chunked parsing may be off the table simply because the library is razor-focused on performance as is and it willingly sacrifices functionality for said performance. A restriction like "Only little-endian 64 bit systems" squarely puts it into hermes-json
territory for me: a good specialized solution if you need it, but not generic enough to build default solutions with.
Me saying "I would use flatparse
" earlier was thus a mistake on my part, as the library I'm looking for is a currently nonexistent variation of attoparsec
that is focused on low-level performance and specialized for streaming chunked input.
Sounds like we need Parsers à la Carte
😄
Chunked parsing is definitely not off the table, nor comprehensive bit-width and endianness support. I actually think we're not far from 32-bit and big-endian support on host machines, it's just that so far no one has cared about it. The current features are based purely on what users want to have, although it is true that current users want to have high performance.
Now that @BurningWitness expressed desire for it, there's a better chance for chunked parsing being added. Technically, I don't think there's any issue. Basically, we need to convert the end-of-buffer pointer and the ForeignPtrContents
to being threaded as state (in non-chunked parsers they're just reader parameters), and I believe the MVar
method would be very efficient for single-shot resumption. There is definitely an overhead to the extra state, but I'm sure this would be still much faster than any other resumable parser in Haskell-land.
We already provide native byte order machine integer parsers. They should only require a small CPP addition to also support non 64-bit word size systems. (I didn't pursue it because all I needed were explicit size & endianness parsers.)
Turns out I was wrong, Haskell does have a streaming parser in that sweet spot between flatparse
and attoparsec
, it's called... binary
. The darn convenience typeclasses really made me forget the parser is a whole thing of its own.
From looking at it binary
still has funky problems of its own, like <|>
concatenating all previously consumed input strictly, but that looks like a small price to pay.