rasendubi/uniorg

Optionally parse org data-trees as nodes instead of pages

ispringle opened this issue · 2 comments

It'd be great if there was a way to parse an org file's data-tree and convert it into nodes of data as opposed to the more markdown-style file parsing where the file is a single entity.

For example given this org file:

* blog
** This is a post
    Here is some post content
** This is another post
    This is some different post content

Currently the above file would get parsed as a single entity and you'd end up with a a <h1>blog</h1> and then the h2 headings under that. If we parsed this in a more org-ish way and treated headings as nodes on a data-tree we'd end up with a data structure such as:

nodes: [
  blog: {
    content: "...",
    nodes: [
      "This is a blog post": {...},
      "This is another post": {...},
    ],
    ...
  },
  ...
]

Hey.

This structure is less "org-ish" because it doesn't follow org structure. Examples of cases that would be hard to handle:

  • inlinetasks (headings in-between content)
  • headings that don't nest nicely: *** headings under * ones
  • repeated heading titles
  • the order of headings is almost lost
  • can your blog posts have any heading? Should all headings be nodes or should we apply an arbitrary rule? (e.g., to only lift headings with ids as org-roam does)

I'd say that this is a rather specific use case (making all headings into "nodes") and I wouldn't implement it. The good news is that it is easy to do yourself: you could traverse org-data and section nodes and lift their section children as nodes (if they satisfy your lifting condition).

(Lifting all headings with IDs as nodes is more common (org-roam) and I would love to see that as a library.)

Yes, there would need to be some property value that signals the heading is now a leaf and not another node in the tree. ox-hugo does this by saying that any heading with a property drawer that contains a :EXPORT_FILE_NAME: is a leaf and all the content in it will be treated as content to be transformed into html. All my blog posts already contain an ID so perhaps that would be a good avenue to pursue.