/org-parser

org-parser is a parser for the Org mode markup language for Emacs.

Primary LanguageClojureGNU Affero General Public License v3.0AGPL-3.0

General

Tests:

Clojars Project

Community chat: #organice on IRC Libera.Chat, or #organice:matrix.org on Matrix

What does this project do?

org-parser is a parser for the Org mode markup language for Emacs.

It can be used from JavaScript, Java, Clojure and ClojureScript!

Why is this project useful / Rationale

Org mode in Emacs is implemented in org-element.el (API documentation). The spec for the Org syntax is written in prose.

This is already great work, yet it has some drawbacks:

  1. The spec is not machine readable. Hence, there can be drift between documentation and implementation. In fact, during the development of organice, our web-based Org implementation with great mobile phone support, and org-parser we have encountered drift.
  2. org-element.el is naturally written in Emacs lisp and makes strong use of Emacs as a text-processor. Hence, its code can only be used within Emacs.

While writing the official spec already is an amazing effort in the standardization of the Org format, the power of Org is so enticing that many want to use it outside of Emacs, as well. Since org-element.el only runs in Emacs, this caused a myriad of implementations for other platforms (JavaScript, Rust, Go, Java, etc) to have been created. Most implementations are only partial, and more importantly each of them creates another island. Since they are just as programming language dependent as org-element.el, it is impossible to share logic between them.

org-parser aims at alleviating both these issues. It documents the syntax in a standard and machine readable notation (EBNF). And the reference implementation is done in a way that it runs on the established virtual machines of Java and JavaScript. Hence, org-parser can be used from all programming languages running on those virtual machines. org-parser provides a higher-level data structure that is easy to consume for an application working with Org mode data. Even if your application is not running on the Java or JavaScript virtual machines, you can embed org-parser as a command-line application. Lastly, org-parser brings a strong test suite to document the reference implementation in yet another unambiguous way.

It is our aim that org-parser can be the foundation on which many Org mode applications in many different languages can be built. The applications using org-parser can then focus on implementing user facing features and don’t have to worry about the implementation of the Org syntax itself.

Architecture

The code base of org-parser is split into four namespaces:

  • org-parser.core (top level api, i.e. read-str, write-str)
  • org-parser.parse (aka. deserializer, reader)
  • org-parser.parse.transform (transforms the result of the parser into a more desirable structure)
  • org-parser.render (aka. serializer, writer)

Thus org-parser has become a misnomer in the sense, that it now strives to be clojure/data.org (after the pattern of existing Clojure libraries like data.json, data.xml, data.csv, etc) providing reader as well as writer capabilities for the serialization format org.

Project State

This project is work-in-progress. It is not ready for production yet because the structure of the AST (parse tree) can still change.

The biggest milestones are:

  • [X] Finish EBNF parser to support most Org mode syntax
    • [X] Headlines
    • [X] Org mode #+* stuff
    • [X] Timestamps
    • [X] Links
    • [X] Text links
    • [X] Footnotes
    • [X] Styled text
    • [X] Drawers and #+BEGIN_xxx blocks
    • [ ] Nested markup (see #12)
  • [X] Setup basic transformation from the parse tree to a higher-level structure.
  • [-] Transformations to higher-level structure: catch up with features that are already supported by the EBNF parser.
  • [-] Render parsed org file with write-str

It can already be useful for you: E.g. if your script needs to parse parts of Org mode features, our EBNF parser probably already supports that. Do not underestimate e.g. timestamps. Use our well-tested parser to disassemble it in its parts, instead of trying to write a poor and ugly regex that is only capable of a subset of Org mode’s timestamps ;)

Don’t hesitate to contribute!

Development

org-parser uses instaparse which aims to be the simplest way to build parsers in Clojure. Apart from living up to this claim (and beyond the scope of just the one programming language), using instaparse is great for another reason: Instaparse works both on CLJ and CLJS. Therefore org-parser can be used from both ecosystems which, of course, include JavaScript and Java. Hence, it is possible to use it in various situations.

Prerequisites

Please install Clojure and Leiningen.

There’s no additional installation required. Leiningen will pull dependencies if required.

Testing

Running the tests:

# Clojure
lein test
# CLJS (starts a watcher)
lein doo node

If you’re not familiar with Lisp or Clojure, here’s a short video on how the tooling for Lisp (and hence Clojure) is great and enables fast developer feedback and high quality applications. Initially, the video was created to answer a specific issue on this repository. However, the question is a valid general question that is asked quite often by people who haven’t used a Lisp before.

https://raw.githubusercontent.com/200ok-ch/org-parser/master/doc/images/quick_introduction_to_lisp_clojure_and_using_the_repl.jpg

You can watch it here: https://youtu.be/o2MLHFGUkoQ

Release and Dependency Information

Note: The version number should be replaced with the current version of org-parser. See the clojars badge at the top of this README.

CLI/deps.edn dependency information:

org-parser/org-parser {:mvn/version "0.1.4"}

Leiningen dependency information:

[org-parser "0.1.4"]

Usage

At the moment, you can run org-parser from Clojure, ClojureScript, or Java. Other targets which are hosted on the JVM or on JavaScript are possible.

Clojure Library

(ns hello-world.core
  (:require [org-parser.parser :refer [parse]]
            [org-parser.core :refer [read-str write-str]]))

(prn (parse "* Headline"))
(prn (read-str "* Headline"))
(println (write-str (read-str "* Headline")))
[:S [:headline [:stars “*”] [:text [:text-normal “Headline”]]]]
{:headlines [{:headline {:level 1, :title :text-normal “Headline”, :planning [], :tags []}}]}
”* Headline\n”

Clojure

Run lein run file.org, for example:

lein run test/org_parser/fixtures/schedule_with_repeater.org
{:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

Java

First, compile org-parser with:

lein uberjar

Then run java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar file.org, for example:

java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar test/org_parser/fixtures/schedule_with_repeater.org
{:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

Note: The * character must be replaced with the current version number of org-parser. See the clojars badge at the top of this README.

License