Parse semantic blocks where appropriate
schoettl opened this issue · 0 comments
In #7, we came to the conclusion that it's good to parse semantic blocks (instead of only line-based parsing), but only if it's possible and clean in EBNF/instaparse.
Here is a list of some semantic blocks that would need changes in EBNF:
- property drawers
- drawers
- blocks (
#+BEGIN_xxx
) - dynamic blocks (
#+BEGIN:
) - tables
- fixed-width areas (
: sample code
) - footnotes (can span multiple lines)
- text paragraphs (maybe?)
- … (?)
The following elements can not be parsed as semantic elements:
Some of them are already defined in EBNF but not yet "activated".
Quoting from #11:
In this branch, I work on the higher level syntax according to https://orgmode.org/worg/dev/org-syntax.html
Specifically, I want to check out, if we can move away from line-based parsing towards more semantical blocks, called "elements". The orgmode parser used for export is also called org-element.el.
The spec says, that most elements of the syntax are not context-free and the categories for these elements are
“Greater elements”, “elements”, and “objects”
Greater elements are e.g. #+BEGIN_EXAMPLE blocks. Some of these blocks contain raw text (EXAMPLE, SRC, COMMENT, ...), others can contain formatted text (CENTER, QUOTE, ...). Hence, it's better to parse context-aware and parse the multi-line raw content in EXAMPLE but formatted text in CENTER block.
Also, paragraphs, multi-line footnote definitions, lists, tables, property drawers are maybe better parsed as units instead of line-based.
Parsing semantic blocks can later be enabled by changing EBNF:
- <line> = (headline / drawer-begin-line / drawer-end-line / … / content-line) eol
+ <line> = (headline / drawer / … / content-line) eol