CL-ORG-MODE-PARSER - Parsing org-mode files.
Copyright (C) 2013 Olof-Joachim Frahm
Released under a Simplified BSD license.
Implementations currently running on: SBCL. Should be portable though.
Uses fiveam for testing and a couple of general purpose libraries.
There are actually a lot of other packages for the same purpose. The aim of this one is to parse org-mode files via a SAX-like event interface into e.g. CLOS-based documents and to offer some options with regards to parsed elements, interning of tags and similar things.
The parser is currently line- and regex-based, however it might be feasible to use a parser generator instead. For that the syntax description at worg is canonical and should probably be used.
Provided you have a handler, parsing a document is in the simplest case
only a call to PARSE, which by default uses DOCUMENT-BUILDER:
> (cl-org-mode-parser:parse #P"foo.org")
> =>
> #<CL-ORG-MODE-PARSER:DOCUMENT with 2 nodes {1003867DA3}>
Although there are already some test files, it would generally be nice to compare a lot of libraries and their handling of various edge cases. This includes e.g. missing spaces between elements, which however could still be parsed by relaxing the rules a bit (and providing an option to toggle this behaviour).
- figure out whether the
CHARACTERShandler should remain line-based, or be like in SAX with arbitrary chunks of text as input - how much trouble is the whole "keep the printed representation" really (worth)?
- basic roundtrip
- include line/column information in errors
- proper subclasses for (continuable) errors
- parse a lot more special syntax (tables, drawers, ...)