org-similarity
is a package to help Emacs org-mode users discover similar or related files. Under the hood, it uses Python and scikit-learn for text feature extraction, and nltk for text pre-processing. More specifically, this package provides a function to recursively scan a given directory for org
files, clean their content by stripping the front matter and some undesired characters, tokenize them, replace each token with its respective linguistic stem, generate a TF-IDF sparse matrix, and calculate the cosine similarity between these documents and the buffer you are currently working on.
Clone the repository:
git clone https://github.com/brunoarine/org-similarity
Then load the package by adding it to your load-path
in your init.el
:
(add-to-list 'load-path "/path/to/org-similarity")
(require 'org-similarity)
Or, with use-package
:
(use-package org-similarity
:load-path "path/to/org-similarity")
There are a few variables that can be set to customize how org-similarity
operates and generates the list of similar documents:
;; Directory to scan for possibly similar documents.
;; org-roam users might want to change it to `org-roam-directory'.
(setq org-similarity-directory org-directory)
;; The language passed to the Snowball stemmer in the `nltk' package. The
;; following languages are supported: Arabic, Danish, Dutch, English, Finnish,
;; French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian,
;; Spanish and Swedish.
(setq org-similarity-language "english")
;; How many similar entries to list at the end of the buffer.
(setq org-similarity-number-of-documents 10)
;; Whether to prepend the list entries with similarity scores.
(setq org-similarity-show-scores nil)
;; Whether the resulting list of similar documents will point to ID property or
;; filename. Default it nil.
;; However, I recommend setting it to `t' if you use `org-roam' v2.
(setq org-similarity-use-id-links nil)
;; Scan for files inside `org-similarity-directory' recursively.
(setq org-similarity-recursive-search nil)
In a buffer with an open org-mode
file, type M-x org-similarity-insert-list RET
.
If this is the first time you run that command, it will ask you to install the necessary Python dependencies. By pressing y
to continue, it will create a virtual environment under the hood and download the requirements automatically. Once finished, you can close the installation log buffer, and run the command again.
- Automated installation of Python dependencies (using virtual environments).
- Better
org-roam
v2 compatibility. orgparse
to parse org-mode files.- Refactored and optimized Python code.
- Alpha release of the package.
- Tested with
org-roam
v1.