dmaevsky/rd-parse

Fork: streaming parser

Closed this issue · 3 comments

Sorry this isn't actually an issue, I just don't have another way to contact you.

I'm going to do a hard fork of rd-parse to create a parser over iterable inputs. I've just finished developing the appropriate primitive to handle consuming the input iterable with size k lookahead. I also built a regex engine that can match against iterables. I think I can integrate those components into this framework pretty easily, but I don't see any point to trying to make rd-parse do those things. My fork will not be nearly as lightweight in terms of execution speed or bundle size, but it'll do what I need it to do: parse file headers without incurring the costs of reading the whole file.

I'm gonna call it @iter-tools/rd-parse and the repo will be here.

To make this work well I'd also need to convert the current depth-first search algorithm into a breadth-first search. The main goal would be to evaluate and prune failing branches ASAP so that they can be discarded and data can be freed from buffers.

For the moment I've changed my mind. I realized two things: one, that the depth-first approach of recursive descent would be a bad match for an iterable. Two, that I don't need a whole streaming parser. I'm just parsing header comments, and a C comment block can be parsed with regex. I already wrote a streaming regex engine, so it's easy for me to extract the first C header comment from a file while reading no more or less than needed. Once I have the content of the first comment in a string, then I can use rd-parse on it.

That said, I'm still forking the library for now. I need to be able to tinker with it. Maybe my fork will get merged back, maybe not. It's a small library so it shouldn't matter too much.