/python-parsing-tools

A list of Python parsing tools

OtherNOASSERTION

Python Parsing Tools

A list of Python parsing tools initially imported from @nedbat's blog post.

The List

Name Description License Updated Parses Used By Notes
Ply Docstrings are used to associate lexer or parser rules with actions. The lexer uses Python regular expressions. LGPL v3.6 4/2015 LALR(1) lesscpy ply-hack group
pyparsing Direct parser objects in python, built to parallel the grammar. MIT v2.0.3 8/2014 twill
ANTLR Parser and lexical analyzer generator in Java. Generates parsing code in Python (as well as Java, C++, C#, Ruby, etc). BSD v4.4 7/2014 LL(*)
pyPEG A parsing expression grammar toolkit for Python. GPL v 2.15 1/2015 PEG
pydsl A language workbench written in Python. GPLv3 v 0.5.2 11/2014
LEPL A recursive descent parser. dual licensed MPL/LGPL v 5.1.3 9/2012 Discontinued
Codetalker Python-based grammar definitions. MIT v 1.1 3/2014
funcparserlib Recurisve descent parsing library for Python based on functional combinators. MIT v0.3.6 5/2013
picoparse v0.9 3/2009
Aperiot Apache 2.0 v0.1.12 1/2012
PyGgy Lexes with DFA from a specification in a .pyl file. Parses GLR grammars from a specification in a .pyg file. Rules in both files have Python action code. Unlike most parser generators, the rule actions are all executed in a post-processing step. The parser isn't represented as a discrete object, but as globals in a module. Public Domain v0.4 8/2004 Python 3 compatible fork 0.4.1, discussion group
Parsing LR(1) parser generator as well as CFSM and GLR parser drivers. MIT v1.4 12/2012 LR(1), CFSM, and GLR
Rparse GPL v 1.1.0. 4/2010 LL(1) parser generator with AST generation.
SableCC Java-based parser and lexical analyzer generator. Generates parsing code in Java, with alternative generators for other languages including Python. LGPL v 3.7 11/2012
GOLD Parser zlib/libpng v 5.2.0 8/2012 LALR
Plex Python module for constructing lexical analysers. LGPL v 2.0 12/2009 compiles all of the regular expressions into a single DFA.
Plex3 Python3 port of Plex 8/2012 No official release
yeanpypa Yeanpypa is (yet another) framework to create recursive-descent parsers in Python. Public Domain 4/2010 Parsers are created by writing an EBNF-like grammar as Python expressions.
ZestyParser MIT v 0.8.1 4/2007
BisonGen v 0.8.0b1 4/2005
DParser for Python A scannerless GLR parser BSD v 1.3.0 3/2013 Charming Python: DParser for Python: Exploring Another Python Parser
Yapps Produces recursive-descent parsers, as a human would write. Designed to be easy to use rather than powerful or fast. Better suited for small parsing tasks like email addresses, simple configuration scripts, etc. MIT v 2.1.1 8/2003
PyBison Python binding to the Bison (yacc) and Flex (lex) parser-generator utilities GPL v 0.1.8 6/2004 LALR(1) Doesn't yet support Windows.
Yappy v 1.9.4 8/2014 SLR, LR(1) and LALR(1) Uses python strings to declare the grammar.
Toy Parser Generator LGPL v 3.2.2 12/2013
kwParsing An experimental parser generator implemented in Python which generates parsers implemented in Python. Python License v 1.3 SLR Gadfly
Martel Martel uses regular expression format definition to generate a parser for flat- and semi-structured files. The parse tree information is passed back to the caller using XML SAX events. In other words, Martel lets you parse flat files as if they are in XML. BSD v 0.8 12/2001 BioPython versions 1.4-1.48 Last version included in BioPython
SimpleParse Lexing and parsing in one step, but only deterministic grammars. BSD 2.11a2 8/2010
mxTextTools An unusual table-based parser. There is no generation step, the parser is hand-coded using primitives provided by the package. The parser is implemented in C for speed. (just above). eGenix Public License, similar to Python, compatible with GPL. v 3.2.8 7/2014 SimpleParse, Martel
SPARK Uses docstrings to associate productions with actions. Unlike other tools, also includes semantic analysis and code generation phases. MIT v 0.7 pre-alpha 7 5/2002
FlexModule and BisonModule Macros to allow Flex and Bison to produce idiomatic lexers and parsers for Python. The generated lexers and parsers are in C, compiled into loadable modules. Pythonesque v 2.1 3/2002
Bison in a box Uses standard Bison to generate pure Python parsers. It actually reads the bison-generated .c file and generates Python code. GPL v 0.1.0 6/2001 LALR(1)
Berkeley YACC Classic YACC, extended to generate Python code. Python support seems to be undocumented. Public Domain v 20141128 11/2014 LALR(1)
PyLR Lexer is based on regular expressions. 12/1997 LR
PyLR PyLR is a partial Python implementation of the OpenLR specification Apache 2.0 12/2014 announcement
Construct A declarative parser (and builder) for binary data. BSD v 2.5.2 4/2014
ModGrammar A general-purpose library for constructing language parsers and interpreters for context-free grammar definitions. BSD v 0.10 2/2013
lrparsing Differs from other Python LR(1) parsers in using Python expressions as grammars, and offers disambiguation tools. AGPLv3 v 1.0.11 3/2015 LR(1) parser and a tokeniser
docopt Generates a parser based on formalized conventions that are used for help messages and man pages for program interface description. MIT v 0.6.2 6/2014

Standard Modules

The Python standard library includes a few modules for special-purpose parsing problems. These are not general-purpose parsers, but don't overlook them. If your need overlaps with their capabilities, they're perfect:

  • shlex lexes command lines using the rules common to many operating system shells.
  • ConfigParser implements a basic configuration file parser language which provides a structure similar to what you would find on Microsoft Windows INI files.
  • ArgParse makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.
  • email provides many services, including parsing email and other RFC-822 structures. parser parses Python source text.
  • cmd implements a simple command interface, prompting for and parsing out command names, then dispatching to your handler methods.
  • json is a JSON (JavaScript Object Notation) encoder and decoder
  • tokenize is a lexical scanner for Python source code, implemented in Python.

Articles

Licensing and Attribution

Creative Commons License
Python Parsing Tools by Michael R. Bernstein is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://github.com/webmaven/python-parsing-tools/.