Resumable parsing using continuation primitives

Question

Resumable parsing using continuation primitives

AndrasKovacs opened this issue 2 years ago · 9 comments

So far, resumable parsers were really only feasible with CPS-based internals, like in Attoparsec. Unfortunately, that has painful overhead because it effectively moves the control stack of the parsers to the heap, and continuation closure calls are also slower than native GHC returns.

GHC 9.6.x is now available in release candidates, and it has new primops for delimited continuations: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0313-delimited-continuation-primops.rst It might be interesting to investigate the new primitives. The idea would be to save the current parser continuation when running out of input, instead of throwing an error.

Answer 1 · 2023-02-21T20:03:34.000Z

On further consideration, I don't think that Attoparsec-style resumption is useful enough. All it gets us is amortization of the cost of parsing itself, but it gives us no usable incremental results. For example, when we get parsing input in a bunch of chunks, we can immediately parse a chunk when it arrives, but we get no output at all until all chunks are parsed. All we get is potential latency improvement compared to just buffering bytestring chunks and parsing them in one go, but I find it unlikely that flatparse would be a latency bottleneck in practice.

Answer 2 · 2023-08-06T06:07:05.000Z

I've just completed writing a JSON parser from scratch (GaloisInc/json#17) and I used attoparsec specifically because flatparse only works over strict ByteStrings.

we get no output at all until all chunks are parsed

Only in the most basic implementation. If you wrap the results into a Stream, you can make a Parser (a, Parser (a, Parser ...)) chain of arbitrary depth. It looks a tad crude because Parser needs be run anew every time, but it works well and is composable.

All we get is potential latency improvement...

Yes, since we can't know that the input is correct until we've traversed all of it, the only benefit of a chunked parser is the potential for lower memory and latency overheads. A strict parser will always be more performant on small inputs, but it will become prohibitively expensive once we're into hundreds of megabytes of input data.

Also I don't think a chunked parser can share the foundation with the strict one. From my understanding there are two extra operations I would want:

Input resupply. Causes the parser to return a Partial (ByteString -> Parser). I assume embedding such an option into a non-CPS parser is exactly what delimited continuations were added for;
Lazy backtracking. Once asked for (via (<|>) or lookahead), the parser would keep track of all previous chunks, discarding references to them once safe to do so. The main benefit here is not having to reallocate the input data ever and this is something attoparsec can't do because it's internal input representation is strict.

Answer 3 · 2023-08-06T11:21:39.000Z

Can you tell me more about your use cases? If we're talking about hundreds of MBs of JSON, and the output does fit reasonably into memory, it should be parseable in at most a few seconds in flatparse. 200 MB/s would be a reasonable lower estimate for going from JSON to uncompressed AST. If the output does not fit to memory, we could do compressed/succinct output or lazy output but that still does not necessary require input resupply.

Ofc supporting input resupply is better than not supporting it and there are use cases for it. Since my last comment here I thought a bit more about the implementation, and one question is: do we want "one-shot" or "multi-shot" resumption (borrowing effect handler terminology)? In the former case a suspended parser can be resumed only once, because resumption is destructive.

The former case can be implemented a lot more efficiently, simply using a new GHC thread for the parser which blocks on an MVar when it runs out of input.

The latter requires making a copy of the parser stack whenever we run out of input, so it requires continuation
primops. It's more expensive and only supported in GHC 9.6+.

Do you see any plausible use case for multi-shot resumption? What it allows is essentially branching parsing with multiple different supplied inputs. I've never seen applications for this to be honest.

Lazy backtracking. Once asked for (via (<|>) or lookahead), the parser would keep track of all previous chunks, discarding references to them once safe to do so.

I don't quite understand this point. I think any chunked/resumable parser would keep track of required chunks just by virtue of them being live data, and not being GC-d. On a backtracking (<|>) the chunk where we backtrack to is just saved to the program stack.

Answer 4 · 2023-08-06T13:49:11.000Z

Can you tell me more about your use cases?

The only real one I know of that assumes chunking by the default is wai's getRequestBodyChunk. Content-Length header is entirely optional, so this isn't as optimal as, say, reading a file from disk.

any chunked/resumable parser would keep track of required chunks just by virtue of them being live data

Two distinct problems here:

A function like getRequestBodyChunk cannot be replayed, so in presence of backtracking we also need to keep track of what the last discovered chunk is;
I assume, perhaps naively, that a chunked parser only operates over a single chunk at a time, so when backtracking there will be extra chunks between the backtracking start and the known end. While the position is a part of the local context, I don't know if these extra chunks count.

"one-shot" or "multi-shot" resumption

I'm out of my depth here, but after a bit of reading on this I think this question is just "what if we take attoparsec's Partial and run it twice". I've never tried it in attoparsec and I assume it's unsafe there because underlying input representation is not pure.

Input chunks, in this specific case, are always constant once discovered, so supporting an operation like this is entirely optional. This sounds "one-shot", but I don't know if running the parser against an MVar is the best solution here. The only argument I have against this is that the API is gonna be less "Haskelly".

It's more expensive and only supported in GHC 9.6+.

My implementation already requires text-2.0 or higher, that's effectively GHC 9.4+. The GHC status page says anything older than three most recent major versions is outdated and that's roughly two full years of time.

I can't say anything on the performance front because my main gripe with attoparsec isn't it's speed, it's the fact that it does extra stuff because it's internal input representation is generic.

Answer 5 · 2023-08-07T07:49:00.000Z

Now that I look at the list of restrictions flatparse comes with, chunked parsing may be off the table simply because the library is razor-focused on performance as is and it willingly sacrifices functionality for said performance. A restriction like "Only little-endian 64 bit systems" squarely puts it into hermes-json territory for me: a good specialized solution if you need it, but not generic enough to build default solutions with.

Me saying "I would use flatparse" earlier was thus a mistake on my part, as the library I'm looking for is a currently nonexistent variation of attoparsec that is focused on low-level performance and specialized for streaming chunked input.

Answer 6 · 2023-08-07T10:35:30.000Z

Sounds like we need Parsers à la Carte 😄

Answer 7 · 2023-08-07T11:10:40.000Z

Chunked parsing is definitely not off the table, nor comprehensive bit-width and endianness support. I actually think we're not far from 32-bit and big-endian support on host machines, it's just that so far no one has cared about it. The current features are based purely on what users want to have, although it is true that current users want to have high performance.

Now that @BurningWitness expressed desire for it, there's a better chance for chunked parsing being added. Technically, I don't think there's any issue. Basically, we need to convert the end-of-buffer pointer and the ForeignPtrContents to being threaded as state (in non-chunked parsers they're just reader parameters), and I believe the MVar method would be very efficient for single-shot resumption. There is definitely an overhead to the extra state, but I'm sure this would be still much faster than any other resumable parser in Haskell-land.

Answer 8 · 2023-08-07T14:46:31.000Z

We already provide native byte order machine integer parsers. They should only require a small CPP addition to also support non 64-bit word size systems. (I didn't pursue it because all I needed were explicit size & endianness parsers.)

Answer 9 · 2023-12-06T14:04:31.000Z

Turns out I was wrong, Haskell does have a streaming parser in that sweet spot between flatparse and attoparsec, it's called... binary. The darn convenience typeclasses really made me forget the parser is a whole thing of its own.

From looking at it binary still has funky problems of its own, like <|> concatenating all previously consumed input strictly, but that looks like a small price to pay.