The Org Element API allows parsing & manipulating org-mode files in the form of lisp objects.
A cache for these objects is already included in org-mode and disabled by default.
However, when working with large numbers (> 1000) of files, populating and processing this cache for each file takes a long time.
This package tries to alleviate this problem by:
- Allowing users to register hooks that compute some data form an org element
- Persisting the results on disk
- Taking care of keeping the cached data in sync with the actual buffer contents
In the current version, only hooks on a file-level are implemented. Future versions might include a way to register hooks for element types.
A cache has four components:
- A hash table with a plist for each file managed by the cache
- A list of folders used to find files managed by the cache
- A list of hooks to generate cached data
- A file to persist the cache in
A cache is populated with entries by opening each file in a temporary
buffer, parsing it to an org-element
and passing this element to
each of the hooks.
Doing this for the first time can take a few seconds, depending on the number of files in the cache.
Once the cache has been saved to disk for the first time, a
combination of the find
and sha1sum
shell commands is used to
determine which entries need to be updated.
Assuming you’re only using one emacs instance at a time, updating the cache on startup should take only a few milliseconds.
The def-org-el-cache
macro can be used to define a new cache.
(def-org-el-cache
my-cache ;; name of the cache / variable to store it in
(list "~/org") ;; directories managed by this cache
"~/org/.cache.el" ;; file to persist the cache in
)
Hooks can be added to a cache with the org-el-cache-add-hook
function.
(org-el-cache-add-hook
my-cache ;; Cache to add the hook to
:my-property ;; Property name to use for this hook
(lambda (filename element)
;; Do something with the element and return some value
))
(org-el-cache-update cache)
updates all outdated entries in the cache(org-el-cache-force-update cache)
re-initializes the cache
If you changed the definition of a hook or added a new one,
use org-el-cache-force-update
to re-initialize the cache.
(org-el-cache-get cache filename)
returns the cache entry for a file(org-el-cache-file-property cache filename prop)
returns the value of a files property
The following functions work on all entries of a cache.
For more information on them, refer to the functions documentation
(e.g. C-h f org-el-cache-map
).
org-el-cache-reduce
org-el-cache-map
org-el-cache-select
org-el-cache-each
All caches are saved to disk at regular intervals (configurable via
org-el-cache-presist-interval
, defaults to 10 minutes).
Usually, there is no need to manually persist or load a cache. If you want to do so anyway, you can use the following functions:
(org-el-cache-persist-all)
persists all caches(org-el-cache-persist cache)
persists a single cache(org-el-cache-load cache)
loads a cache from disk
examples/file-selector.el contains the code for a file-selector (similar to Emacs’ find-file) that uses the titles of files instead of their paths.
For more information, you can check out the integration tests in org-el-cache-test.el.
Cached data is limited to objects that can be printed / read (serialized / deserialized).
This means that there is no (elegant) way to cache markers in files, e.g. when using cached headline data for org-agenda views.
A possible workaround would be attaching IDs to every headline, then using this ID (instead of a marker) to jump to a headline e.g. to change its TODO state.
org-el-cache uses a number of shell commands to find files that need to be updated.
find
xargs
sha1sum
These should be installed by default on most Linux / Unix distros.
The hash of a file or buffer is used to determine if it has changed since it was last processed by the cache.
When updating a single file, (buffer-hash)
is fast enough.
To speed up recursively searching directories for .org
files and
calculating their hashes, the find
and sha1sum
shell commands are
used instead of directory-files-recursively
and buffer-hash
.
Working on a collection of ~1500 files with ~200k lines in total, initializing the cache takes around ~36s on my machine (Thinkpad L470, SSD).
Updating the cache once it has been initialized / loaded from disk takes around 200ms.
Org-mode already includes a cache for parsed elements. This is disabled by default since there seem to be problems with the implementation that cause Emacs to hang after making changes to org files.
In the long term, it would be nice to reuse as much of the existing code as possible and figure out where the bugs are.
- Use
secure-hash
instead ofbuffer-hash
- Use
defun
instead ofdefmethod
, the function help looks nicer this way and we don’t need overloaded methods
Copyright © Leon Rische and contributors. Distributed under the GNU General Public License, Version 3