Convert PDF KOReader Highlights, so that the highlights and annotations can be used in different software.
Supported formats:
- Markdown
- obsidian-annotator
- baked in (embedded directly in PDF file)
Feel free to request other formats!
Before running this script, ensure you have the following prerequisites installed:
- Lua: The script requires Lua with the
dkjson
package (install using luarocks).luarocks install dkjson
- Python Dependencies: Install all the Python packages mentioned in the script. You can use pip to install these packages:
Note: You might need additional dependencies depending on your system setup. Run the script and see what it says.
pip3 install fuzzywuzzy PyPDF2 pdfminer lupa PyMuPDF
usage: python3 kohico.py [-h] [--template TEMPLATE] file_path [output_format]
Convert KOReader highlights, either by baking them into the PDF, converting for use with the Annotator plugin for Obsidian, or exporting to Markdown.
positional arguments:
file_path Path to the PDF file. You can also give the path directly to a metadata.pdf.lua file, in which case not all conversion types will be available.
output_format Comma-separated types of output format(s) ('obsidian-annotator'/'obs' for Obsidian Annotator, 'bake' for baking into the PDF, 'markdown'/'md' for markdown output.). Default is 'obsidian-annotator,markdown'.
options:
-h, --help show this help message and exit
--template TEMPLATE Path to an optional Markdown template file.
-
Locate Your PDF: Find the PDF in your KOReader directory. You should see a directory named
<PDFNAME>.sdr
next to it. -
Move Metadata File: Move the
metadata.pdf.lua
file from the.sdr
directory to the same location as the corresponding PDF (if converting for obsidian-annotator, the PDF must also be in your Obsidian vault). -
Run the Script: Execute the script, passing the path to the PDF (or alternatively the metadata.pdf.lua file, in which case only
markdown
is available as output format), and your desired output as the second (You can do multiple outputs by separating with commas). It will generate a file named<PDFNAME>_anno.md
,<PDFNAME>_obs-anno.md
or<PDFNAME>_anno.pdf
next to the original PDF. -
View Your Highlights: Your kohiconverted highlights can now be viewed.
Understanding that not everyone wants their outputted Markdown annotations to be formatted like me, it is possible to change it using a template.
There are two parts to a template, a global
part, and an annotation
part. The global
part will only be outputted once, at the beginning of the resulting Markdown-document. The annotation
part will be repeated for each annotation. They are separated by %annotation
.
Here's how to do it:
- Create a markdown file, you could call it
template.md
. - Populate it how you wish, separating the
global
part from theannotation
part by a line containing%annotation
. Wrap available variables in curly brackets ({}
), as such (the following is the default template):
# {title} annotations
File: {filename}
%annotation
---
Page {page_number}
=={highlight}==
{text}
- Pass the template file along to kohico using
--template <PATH_TO_TEMPLATE_FILE>
. Remember, this only works with conversion typemarkdown
(ormd
).
The following variables are available for use in the template:
- filename: The file name.
- title: The title of the PDF document.
- author: The author of the PDF document.
- highlight: The actual highlit text.
- text: Any written comment to the highlight.
- page_number: The page on which the highlight is found.
- unique_id: A unique ID (meant for use by obsidian-annotator) if you need it.
- iteration: Starting from 1, it represents the index of the current annotation.
- title: The title of the PDF document.
The script acceses the highlights metadata from KOReader, and converts it to JSON. It then iterates through every highlight, goes into the PDF and finds the highlight there, so that it can extract the surrounding textual content.
It then outputs these processed highlights and annotations into your desired format.
- The script needs access to both the PDF and the Lua metadata file to successfully convert the highlights.
- Ensure that the Lua environment and all Python dependencies are correctly installed before running the script.
Feel free to open an issue for any bugs or feature requests, or submit a pull request if you've made improvements to the script.