/org-el-cache

Persistent cache for data derived from org-elements

Primary LanguageShellGNU General Public License v3.0GPL-3.0

Org Element Cache

Introduction

The Org Element API allows parsing & manipulating org-mode files in the form of lisp objects.

A cache for these objects is already included in org-mode and disabled by default.

However, when working with large numbers (> 1000) of files, populating and processing this cache for each file takes a long time.

This package tries to alleviate this problem by:

  1. Allowing users to register hooks that compute some data form an org element
  2. Persisting the results on disk
  3. Taking care of keeping the cached data in sync with the actual buffer contents

In the current version, only hooks on a file-level are implemented. Future versions might include a way to register hooks for element types.

Structure of a Cache

A cache has four components:

  1. A hash table with a plist for each file managed by the cache
  2. A list of folders used to find files managed by the cache
  3. A list of hooks to generate cached data
  4. A file to persist the cache in

Populating the Cache

A cache is populated with entries by opening each file in a temporary buffer, parsing it to an org-element and passing this element to each of the hooks.

Doing this for the first time can take a few seconds, depending on the number of files in the cache.

Once the cache has been saved to disk for the first time, a combination of the find and sha1sum shell commands is used to determine which entries need to be updated.

Assuming you’re only using one emacs instance at a time, updating the cache on startup should take only a few milliseconds.

Usage

Defining Caches

The def-org-el-cache macro can be used to define a new cache.

(def-org-el-cache
  my-cache          ;; name of the cache / variable to store it in
  (list "~/org")    ;; directories managed by this cache
  "~/org/.cache.el" ;; file to persist the cache in
  )

Adding Hooks

Hooks can be added to a cache with the org-el-cache-add-hook function.

(org-el-cache-add-hook
 my-cache               ;; Cache to add the hook to
 :my-property           ;; Property name to use for this hook
 (lambda (filename element)
   ;; Do something with the element and return some value
   ))

Updating Caches

  • (org-el-cache-update cache) updates all outdated entries in the cache
  • (org-el-cache-force-update cache) re-initializes the cache

If you changed the definition of a hook or added a new one, use org-el-cache-force-update to re-initialize the cache.

Accessing Cached Data

  • (org-el-cache-get cache filename) returns the cache entry for a file
  • (org-el-cache-file-property cache filename prop) returns the value of a files property

The following functions work on all entries of a cache. For more information on them, refer to the functions documentation (e.g. C-h f org-el-cache-map).

  • org-el-cache-reduce
  • org-el-cache-map
  • org-el-cache-select
  • org-el-cache-each

Persisting Caches

All caches are saved to disk at regular intervals (configurable via org-el-cache-presist-interval, defaults to 10 minutes).

Usually, there is no need to manually persist or load a cache. If you want to do so anyway, you can use the following functions:

  • (org-el-cache-persist-all) persists all caches
  • (org-el-cache-persist cache) persists a single cache
  • (org-el-cache-load cache) loads a cache from disk

Example

examples/file-selector.el contains the code for a file-selector (similar to Emacs’ find-file) that uses the titles of files instead of their paths.

For more information, you can check out the integration tests in org-el-cache-test.el.

Prior / Similar Art

Limitations

Cached data is limited to objects that can be printed / read (serialized / deserialized).

This means that there is no (elegant) way to cache markers in files, e.g. when using cached headline data for org-agenda views.

A possible workaround would be attaching IDs to every headline, then using this ID (instead of a marker) to jump to a headline e.g. to change its TODO state.

Dependencies

org-el-cache uses a number of shell commands to find files that need to be updated.

  1. find
  2. xargs
  3. sha1sum

These should be installed by default on most Linux / Unix distros.

Hashing

The hash of a file or buffer is used to determine if it has changed since it was last processed by the cache.

When updating a single file, (buffer-hash) is fast enough.

To speed up recursively searching directories for .org files and calculating their hashes, the find and sha1sum shell commands are used instead of directory-files-recursively and buffer-hash.

Benchmarks

Working on a collection of ~1500 files with ~200k lines in total, initializing the cache takes around ~36s on my machine (Thinkpad L470, SSD).

Updating the cache once it has been initialized / loaded from disk takes around 200ms.

Relation to org-element-cache

Org-mode already includes a cache for parsed elements. This is disabled by default since there seem to be problems with the implementation that cause Emacs to hang after making changes to org files.

In the long term, it would be nice to reuse as much of the existing code as possible and figure out where the bugs are.

Changelog

[2020-02-08 Sat 12:50]

Internal

  • Use secure-hash instead of buffer-hash
  • Use defun instead of defmethod, the function help looks nicer this way and we don’t need overloaded methods

License

Copyright © Leon Rische and contributors. Distributed under the GNU General Public License, Version 3