This library is SLOW. For large input sequences or large chains of parsers, it will also likely eat an unreasonable amount of memory, even compared to other pure Python parsers like Parsimonius. This is a PROOF OF CONCEPT. Do NOT use it in production code.
Pycoparsec is my attempt at making a parser combinator style parsing library for Python. The design of the code and of the library takes after Megaparsec, a parser combinator library. Although its feature set more closely aligns with Attoparsec due to the ability to stream tokens into Pycoparsec, and due to Pycoparsec's shitty error reporting (right now it just raises an empty FailedParsing
object, lol).
My goals for the project are as follows:
- Type safety, or at least as close to it as Python can get. The whole library is PEP484 type hinted. I've opted to keep it 3.8 compatible -- that means no
typing.Self
or subscriptinglist
. That can change in the future. - The ability to ingest arbitrary iterators. This means no peeking ahead at the rest of the tokens, and this means sexy error messages would require me to do hella extra bookkeeping.
- The ability to construct arbitrary Python objects spat directly out of the parser. It currently does this by folding successive objects with
+
, so if you want to construct objects in a smarter way you'll have to construct your own output classes. There's some funky-ness with how object construction even happens, with the method to construct intermediate output objects embedded directly in the signature ofParser.exactly
. I am not sure I am satisfied with this yet. No monoids and semigroups means nomappend
and<>
to automagically build objects for us. - Rich test suite. I haven't used
pytest
much, but damnit, I'm gonna learn! Tomorrow. - Code readability. In a perfect world I would like the main chunk of the code to be a well documented ~500 LoC. You should be able to audit the whole library in an evening, and emerge on the other side with a full understanding of it.
PRs, issues, and contributions welcome. Thanks for reading.
- Some way to easily repeat parsers without calling
Parser.then
over and over again. You can do some silly stuff likeparser.then(parser)
, but at some point you're gonna blow the stack and then nobody's having fun. - A test suite!
Everything revolves around the Parser
object, and that's your building block for everything else. Construct one, then pass a factory to Parser.exactly
or combine it with other parsers with Parser.choice
or Parser.then
. A parser that didn't have Parser.exactly
called will always fail, and will either end the chain it is in or proceed to the next alternative. It is dead simple -- the rest is up to you. Here's a fun recipe:
from pycoparsec import Parser
def string_parser(wanted_string):
out = Parser().exactly(wanted_string[0], str)
for c in wanted_string[1:]:
out.then(Parser().exactly(c, str))
return out
string_parser("Hello").run(c for c in "Hello, world!") # => "Hello"
Or maybe you're more alternative?
from pycoparsec import Parser
only_accept_0_to_9 = Parser().exactly(1, str) | Parser().exactly(2, str) | Parser().exactly(3, str) | Parser().exactly(4, str) | Parser().exactly(5, str) | Parser().exactly(6, str) | Parser().exactly(7, str) | Parser().exactly(8, str) | Parser().exactly(9, str) | Parser().exactly(0, str)
only_accept_0_to_9.run(n for n in range(100)) # => 0
This package is built using Hatch. This project was partly an excuse to try Hatch, so I've used it to the highest degree possible. Type hatch env show
to list all of the goodies available to you. If you want to run one of the listed scripts, the syntax is hatch run <ENV NAME>:<SCRIPT NAME>
. So for example, to open the docs the command is hatch run docs:open
, which automatically builds them and calls xdg-open
on the index.