This is a SQL backend for Emacs Org-Mode. It scans through text files formatted in org-mode, parses them, and adds key information such as todo keywords, timestamps, and links to a relational database. For now only SQLite is supported.
Despite the fact that Emacs is the (second?) greatest text editor of all time, it is not a good data analysis platform, and neither is Lisp a good data analysis language. This is the strong-suite of other languages such as Python and R which have powerful data manipulation packages (dplyr
and pandas
) as well as specialized presentation platforms (shiny
and dash
). The common thread between these data analysis tools is a tabular data storage system, for which SQL is the most common language.
Therefore, the primary goal of org-sql
is to provide a link between the text-based universe of Emacs / Org-mode and the table-based universe of data analysis platforms.
A common use case for org-mode is a daily planner. Within this use case, some questions that can easily be answered with a SQL-backed approach:
- How much time do I spend doing pre-planned work? (track how much time is spent clocking)
- How well do I estimate how long tasks take? (compare effort and clocked time)
- Which types of tasks do I concentrate on the most? (tag entries and sort based on effort and clocking)
- How indecisive am I? (track how many times schedule or deadline timestamps are changed)
- How much do I overplan? (track number of canceled tasks)
- How much do I delegate (track properties indicating which people are working on tasks)
- How many outstanding tasks/projects do I have? (count keywords and tags on headlines)
There are other uses for an Org-mode SQL database. If one has many org files scattered throughout their filesystem, a database is an easy way to aggregate key information such as links or timestamps. Or if one primary uses org-mode for taking notes, it could be a way to aggregate and analyze meeting minutes.
Of course, these could all be done directly in Org-mode with Lisp code (indeed there are already built-in functions for reporting aggregated effort and clock-time). But why do that when one could analyze all org files by making a descriptive dashboard with a relatively few lines of R or Python code?
Download the package from MELPA
M-x package-install RET org-sql RET
Alternatively, clone this repository into your config directory
git clone git@github.com:ndwarshuis/org-sql.git ~/config/path/org-sql/
Once obtained, add the package to load-path
and require it
(add-to-list 'load-path "~/config/path/org-sql/")
(require 'org-sql)
One can also use use-package
to automate this entire process
(use-package org-sql
:ensure t
;; add config options here...
:config)
Emacs-Lisp packages
- Org-mode (tested with 9.x)
- dash (tested with 2.x)
Installed programs
- sqlite (tested with 3.x)
org-sql-sqlite-path
: the path to the sqlite database to be createdorg-sql-files
: list of org files to insert into databaseorg-sql-pragma
: pragma to use for new connections (useful for performance tuning)org-sql-buffer
: the name of the buffer for the SQLite connectionorg-sql-debug
: turn on SQL transaction debug output in the message buffer
These options control what data gets stored in the database, and are useful to minimize the size of the database as well as the time it takes to update:
org-sql-ignored-properties
: list of properties to ignoreorg-sql-ignored-tags
: list of tags to ignoreorg-sql-ignored-link-types
: list link types to ignore (eg mu4e, file)org-sql-included-healine-planning-types
: planning types (eg:closed
,:scheduled
) to includeorg-sql-included-contents-timestamp-types
: type of timestamps (egactive
,inactive
) to includeorg-sql-included-logbook-types
: types of logbook entries to include (egnote
,reschedule
, etc)org-sql-use-tag-inheritance
: add inherited tags to the databaseorg-sql-store-clocks
: whether to include clocks (nil implies no clock notes are desired either)org-sql-store-clock-notes
: whether to include clock notes
Much of the extracted data from org-sql
pertains to logbook entries, and there are a number of settings that effect how this data is generated in org files and how it may be parsed reliably.
Firstly, one needs to set the relevant org-mode
variables in order to capture logging information. Please refer to the documentation in org-mode
itself for their meaning:
org-log-done
org-log-reschedule
org-log-redeadline
org-log-note-clock-out
org-log-refile
org-log-repeat
org-todo-keywords
(in this one can set which todo keywords changes are logged)
Obtaining the above information for the database assumes that org-log-note-headings
is left at its default value. This limitation may be surpassed in the future.
Additionally, for best results it is recommended that all logbook entries be contained in their own drawer. This means that org-log-into-drawer
should be set to LOGBOOK
and org-clock-into-drawer
should be set to t
(which means clocks go into a drawer with hardcoded name LOGBOOK
). Without these settings, org-sql
needs to guess where the logbook entries are based on location and pattern matching, which is not totally reliable.
Run org-sql-user-reset
. This will create a new database and initialize it with the default schema. It will also delete an existing database before creating the new one if it exists in org-sql-sqlite-path
.
Run org-sql-user-update
. This will synchronize the database with all files as indicated in org-sql-files
by first checking if the file is in the database and inserting it if not. If the file is already present, it will check the md5 to assess if updates are needed. This function will insert the entire content of any org file that is either new or changed.
Note that the database will take several seconds to minutes if inserting many files depending on the speed of your device (particularly IO) and the size/number of files. This operation will also block Emacs until complete.
Run org-sql-user-clear-all
. This will clear all data but leave the schema.
The database is arranged by files at the top level and by org-mode
features moving down to child tables. Primary keys are foreign keys are marked with P and F in parens respecively. All dates are converted into unix time integers before entering into the database.
The databases are arranged as follows according to their foreign key contraints:
- files
- headlines
- tags
- properties
- clocking
- logbook
- state_changes
- planning_changes
- headlines
Stores one row per org file
column | type | description |
---|---|---|
file_path (P) | text | path to an org file |
md5 | text | md5 sum of the org file |
size | integer | size of the file in bytes |
time_modified | integer | unused |
time_created | integer | unused |
time_accessed | integer | unused |
Stores one row for each headline in a given org file as well as any atomic metadata attached to that headline
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the headline |
headline_offset (F) | integer | offset of the headline’s first character in the org file |
tree_path (P) | text | outline tree path of the headline |
headline_text | text | raw text of the headline |
keyword | text | the TODO state keyword |
effort | integer | the value of the Effort property in minutes |
priority | char | character value of the priority |
archived | bool | true if the headline has an archive tag |
commented | bool | true if the headline has a comment keyword |
contents | text | unused |
Stores one row for each link in a given org file and headline
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the link |
headline_offset (F) | integer | offset of the headline’s first character in the org file |
link_offset (P) | integer | offset of the link’s first character in the org file |
link_path | text | target of the link (url or similar) |
link_text | text | display text of the link in the org buffer |
link_type | text | type of the link (eg http, mu4e, file, etc) |
Stores one row for each property identified in an org file and a given headline. This is also where one could identify if a headline is archived as it will have properties such as ARCHIVE_TIME
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the property |
headline_offset (F) | integer | offset of the headline’s first character in the org file |
property_offset (P) | integer | offset of the property’s first character in the org file |
key_text | text | the property key |
val_text | text | the property value |
inherited | boolean | unused |
Stores tags similarly to the properties table
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the tag |
headline_offset (PF) | integer | offset of the headline’s first character in the org file |
tag (P) | text | the tag value |
inherited (P) | boolean | 1 if inherited, 0 if not |
Stores one row for each clock entry identified in an org file and its given headline
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the clock |
headline_offset (F) | integer | offset of the headline’s first character in the org file |
clock_offset (P) | integer | offset of the clock’s first character in the org file |
time_start | integer | timestamp for the start of the clock |
time_end | integer | timestamp for the end of the clock |
clock_note | text | the note text beneath the clock if available |
Stores one row for each entry in the logbook underneath a headline (excluding clocks). Some entries may have additional information associated with them for planning and state changes as given in the child tables below.
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the entry |
headline_offset (F) | integer | offset of the headline’s first character in the org file |
entry_offset (P) | integer | offset of the entry’s first character in the org file |
entry_type | text | type of the logbook entry (see org-log-note=headines ) |
time_logged | integere | timestamp for when the entry was taken |
header | text | the first line of the note, usually standardized |
note | note | the text underneath the entry header |
note: the header should match org-log-note-headings
unless it is a clock note
Stores one row per logbook entry with state change information (as triggered by any keywords configured to log in org-todo-headings
)
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the entry |
entry_offset (PF) | integer | offset of the clock’s first character in the org file |
state_old | text | former todo state keyword |
state_new | text | updated todo state keyword |
Stores one row per logbook entry with planning changes as triggered by setting org-log-reschedule
and org-log-redeadline
.
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the entry |
entry_offset (PF) | integer | offset of the entry’s first character in the org file |
timestamp_offset (F) | integer | offset of the timestamp for the former planning entry |
In the future, this will have one row for each timestamp under a headline. For now it just refers to timestamps in the planning block (eg CLOSED, SCHEDULED, or DEADLINE) and timestamps in logbook headings for planning changes. It does not include the timestamp for the time a log entry was taken (this is stored in the logging table above) nor does it store timestamps for clocks (stored in the clocking table). Eventually this table will include timestamps in logbook notes as well as headline contents.
column | type | description |
---|---|---|
file_path (PF) | text | path to file containing the entry |
headline_offset (F) | integer | offset of the entry’s first character in the org file |
timestamp_offet (P) | integer | offset of the timestamp for the former planning entry |
raw_value | text | the string of the timestamp as it appears in org buffers |
type | text | either inactive or active |
planning_type | text | either closed, scheduled, deadline, or null |
warning_type | text | the warning type (all or first) |
warning_value | integer | value of warning shift |
warning_unit | text | unit for warning (hour, day, week, month, year) |
repeat_type | text | the repeater type (catch-up, restart, cumulate) |
repeat_value | integer | value of repeater shift |
repeat_unit | text | unit for repeater (hour, day, week, month, year) |
time | integer | the time of the timestamp in unix time |
time_end | integer | like time but for the end if this is a range |
resolution | text | minute if timestamp specifies hours/minutes, else day |
resolution_end | text | like resolution but for the end if this is a range |
NOTES:
- Inactive-range and active-range will not appear in the type column. Use time_end instead to determine if the timestamp is a ranged timestamp
- There is no distinction in this table alone between planning timestamps that are part of the planning section (eg
CLOSED: [whatever]
) and those from logbook entries that pertain to planning changes (egRemoved deadline, was [whatever]
). This distinction requires joining theplanning_changes
table, which will only reference logbook planning change timestamps.
Pull requests welcome, especially those for other SQL implementations.
The idea for this is based on John Kitchin’s implementation, which uses emacsql
as the SQL backend.