Orgmunge was born out of the desire to modify Org documents programmatically from within Python. The wonderful orgparse can read an Org document into a tree object but doesn’t offer an interface to modify the tree and write it back to file.
The original use case was trying to sync Outlook calendar items with
Org: whenever someone rescheduled a meeting, my Python script was
unable to reschedule the Org heading it had originally
created. Instead of forking orgparse
, I decided to write an actual
grammar for an Org document and use PLY to generate a parser for it.
Now Org syntax is too sophisticated for me to claim that this first
attempt can parse everything. In fact, some folks way smarter than I
am (and with more formal training), have hinted that Org
syntax can’t be properly parsed with a context-free grammar. For such
reasons (and for my own lack of experience with writing grammars), I
have restricted the scope of this module to the features I care about:
for each heading, the headline components (the COMMENT
keyword, the
todo state, priority, cookies, and tags) are all parsed, as well as
any scheduling timestamps and all the drawers. The heading contents
are treated as a blob of text and the only thing the parser extracts
from the contents are the timestamps. No attempts are made at parsing
things like tables or source code blocks further. orgmunge
can also
parse out the document’s metadata and export options but the major
assumption it makes is that the document starts out with some optional
metadata and export options, followed by some optional initial body
text (not falling under any heading), and then a tree of headings. Any
export options or metadata that come later within the document are
treated as text (some heading’s content).
If you have built something on top of orgmunge
, please open an issue
here and I’m happy to add your project to the use cases.
Replace important information in an Org file with random words in order to share the structure of the file with someone without compromising your information. See redactOrg
orgmunge
is now on PyPi- You can install
orgmunge
usingpip
:python3 -m pip install orgmunge
- The only dependency of
orgmunge
isPLY
. So you needPLY
installed. - Clone this repo
- Add the directory where you cloned this repo to your
PYTHONPATH
- The parser needs to know the set of valid keywords before it starts
parsing your input. To do this, it uses the following steps
- If your input string/file contains per-file keywords, these will take precedence over anything else
- Failing to find any such keywords, it looks to see if you passed
it the keywords using the
todos
argument - If no todo keywords were passed, the parser looks for todo keywords by looking for a file named
todos.json
in one of 2 places (again in order of preference):- The current directory
- The user’s home directory
- Failing all the above, the keywords are assumed to be defined by:
{ "todo_states": { "todo": "TODO", "next": "NEXT", "wait": "WAIT" }, "done_states": { "cncl": "CNCL", "done": "DONE" } }
- If you choose to supply your own keywords as an argument to the
parser, you must follow the above structure: separate
todo_states
anddone_states
with pairs ofkeyword_nickname: keyword
specifying each set of states.
- The
Org
class in__init__.py
is the main entry point toorgmunge
. It can be used to read an Org tree either from a string or from a file:from orgmunge import Org org_1 = Org('* TODO Something important\n', from_file=False) # \n needed to signify end of document org_2 = Org('/path/to/my/file.org') org_3 = Org('/path/to/my/file.org', debug=True) # Print PLY debugging info
- The
Org
object has 3 main attributes you should care about:Org.metadata
stores the metadata and export options found at the beginning of the file. This is a dict mapping the option/keyword name to a list of its values (to allow for cumulative keywords such as#+OPTION
). Example:org_1 = Org('#+title: Test\n') assert(org_1.metadata['title'] == ['Test'])
Org.initial_body
stores any text between the metadata and the first heading.Org.root
stores the root of the Org tree. This is a heading with the headlineROOT
whose only useful attribute ischildren
, which is a list of all the headings in the given document.
- The Org tree is a list of headings with parent, child and sibling relationships.
- A heading object consists of:
- A headline
- Contents:
- Scheduling, if any
- A list of Drawers, if any
- Body text, if any
- Important attributes:
properties
. This is a dict mapping property names to their values. The properties are parsed from thePROPERTIES
drawer if it exists. This attribute can also be set by the user (the value supplied must be a dict).inherited_properties
. Same format as theproperties
dict but contains only properties inherited from ancestors.tags
returns a list of all tags (those explicitly defined for this heading and those inherited)headline
returns the heading’s headline. This attribute can also be set by a user (the value must be a Headline instance).scheduling
is a Scheduling object containing information aboutSCHEDULED/DEADLINE/CLOSED
timestamps of the heading, if any. Can also be set by the user (the value must be a Scheduling instance).drawers
is a list of Drawer objects containing the drawers associated with this heading. When you update the heading’sproperties
attribute, thePROPERTIES
drawer is updated the next time you access it.children
returns a list of Heading objects that are the direct children of this heading.parent
returns the parent heading of the current one. If the current heading is a top-level heading, the root heading will be returned.sibling
returns the sibling heading of the current one that comes before it in the tree, if any. The reason this is the sibling heading that is formally tracked is because it’s the one that would adopt the current heading whenever the current heading is demoted. If you want a list of all siblings of the current heading, you can do this:siblings = [c for c in current_heading.parent.children if c is not current_heading]
level
is the heading’s level, with 1 being the top level and each sub-level after that being incremented by 1 (the heading’s level is the number of “stars” before its headline).
- Important methods:
clocking
. This returns a list of Clocking objects, parsed from the heading’sLOGBOOK
drawer, if any. You can also pass the optional boolean parameterinclude_children
, which, when True, includes clocking information of this heading’s children as well.get_all_properties
. This returns a dict of all properties of the heading, whether directly defined or inherited from the heading’s ancestors. The latest-defined value of a property wins over.add_child
accepts a Heading object to add as a child to the current heading. The optional boolean parameternew
should be set toTrue
when this is a new heading that was created and needs to be assigned a parent. It should be set toFalse
(default) when the addition of a child is due to a promotion/demotion operation.remove_child
accepts a heading object and deletes it from the current heading’s children if it’s a child of the current heading.promote
promotes the current heading one level. If the heading has children, they would be orphaned so this raises aValueError
. Technically, Org allows you to have, say, level 3 headings under a level 1 heading, butorgmunge
does not allow this to make parsing the tree easier.promote_tree
promotes the current heading and all its descendants. Use this if the heading you want to promote has children.demote
demotes the current heading one level. If the current heading has no sibling to adopt it, the demotion attempt fails and raises aValueError
.demote_tree
is the equivalent ofpromote_tree
for demotion.
- Important attributes:
done
is a boolean attribute that determines whether the headline is in one of the done states. You can’t set this attribute directly.level
is the headline’s level (the number of “stars” before the title)comment
is a boolean attribute that determines whether a headline is commented out (by having the keywordCOMMENT
inserted before the title).todo
returns/sets the headline’s todo state. You can set it yourself but it has to be one of the values ofself._todo_states
orself._done_states
.cookie
returns/sets the headline’s cookie. See Cookie Objects.priority
returns/sets the headline’s priority
- Important methods:
promote
decreases the level by the number given by the parametern
(default 1).demote
acts likepromote
but increases the level byn
instead.toggle_comment
toggles the state of whether or not a headline is commented out using theCOMMENT
keyword.comment_out
ensures the headline is commented out usingCOMMENT
uncomment
ensures the headline is not commented out using theCOMMENT
keyword.raise_priority
increases the headline’s priority by 1lower_priority
decreases the headline’s priority by 1
- Has 6 attributes for the 3 possible scheduling keywords (3 are aliases of the other 3):
- CLOSED, closed
- SCHEDULED, scheduled
- DEADLINE, deadline
- Each attribute, when queried will return either
None
or aTimeStamp
object representing the timestamp associated with this particular scheduling keyword. You can set the attributes directly but they have to be set to aTimeStamp
object.
- A
Drawer
object has only 2 attributes:name
andcontents
. Thecontents
attribute is simply a list of lines making up the drawer contents. When you modify a heading’sproperties
attribute, itsPROPERTIES
drawer gets updated accordingly.
- The
Clocking
objects have 3 attributes:start_time
,end_time
andduration
. Only the first 2 can be set. When setting either, you should pass a string following the Org time format; namely, ‘%Y-%m-%d %a %H:%M’ (see the strftime(3) man page for an explanation of the format codes). - If
end_time
isNone
, the duration is calculated from thestart_time
up to the current moment.
- The only attribute,
priority
can be set directly by the user and can be one of only 3 strings: ‘A’, ‘B’ or ‘C’. Set toNone
to remove it from theHeading
. - The methods
_raise
and_lower
will raise or lower the priority. - If the priority is
None
, raising it, sets it to ‘A’ and lowering it sets it to ‘C’.
- Important attributes:
start_time
andend_time
can be queried and set by the user. You can set them by supplying a string, adatetime
object orNone
.repeater
returns a timestamp repeater string such as ‘+1w’. Can also be set by the user.deadline_warn
acts similarly torepeater
and represents the number of days before a deadline to warn the user of an upcoming deadline.active
is a boolean property and decides whether the time stamp will be printed with[]
or<>
delimiters. Can be set directly by the user.
Cookie
objects represent progress on the currentHeading
.- They can be of type ‘percent’ (e.g. [50%]) or of type ‘progress’ (e.g. [2/4]).
- Important attributes:
cookie_type
: can only be one of ‘percent’ or ‘progress’. Can be set directly by the user.m
andn
represent the progress as the ratiom/n
. If the cookie type is ‘percent’,n
is 100. When changingcookie_type
,m
andn
are converted accordingly.
- The ability to modify the tree was the main reason I wrote this package. Most of the attributes of the tree objects can be modified directly by the user.
- Use the
promote*
anddemote*
methods of theHeading
objects to changeHeading
levels. - To rearrange headings, note that a
Heading's
children
attribute is a list whose ordering is important: in other words, the tree will be written back to a file with the order eachHeading
’s children are in. So the user can rearrange the headings of the same level by assigning thechildren
attribute of their parent to a different order of child headings. It’s up to the user to update the child headings’sibling
attributes appropriately.
- You can use the
Org
object’swrite
method to write out the tree to a file whose name you supply to the method:from orgmunge import Org agenda = Org('/path/to/agenda.org') # Do something with agenda... agenda.write('/path/to/modified_agenda.org')
The convenience method Org.get_all_headings
walks the Org tree
depth-first and returns a generator of all the headings in the tree in
the order in which they occur.
You can use Org.filter_headings(func)
where func
is any arbitrary
predicate and get a generator of all headings satisfying the predicate.
Use Org.get_headings_by_title
to search for a heading with the given title:
Org.get_headings_by_title(search_string, exact=False, re_flags=0)
search_string
is what’s searched in the title. It’s interpreted as a
regex unless exact
is set to True
, in which case, the function will
return headings whose title matches the search string
exactly. re_flags
are flags passed to re.search
. This argument is
ignored if exact
is True
.
Uses filter_headings
under the hood so will return a generator of
matching headings.
Use Org.get_heading_by_path
to search for a heading with the given path:
Org.get_heading_by_path(path, exact=False, re_flags=0)
path
is a list of heading titles. Each member is interpreted the same
way the search_string
argument of get_headings_by_title
is
interpreted. This function returns the first heading of the tree that
matches the given path or None
if no such heading is found.