voicenotes2org
is a Python script which collects WAV files in a given directory, sends them to Google Cloud Platform (GCP) for transcription, and then formats the resulting transcripts into a combined org file, including links back to the original audio.
Each note becomes a heading in the org file, and includes:
- The date and time of the note
- An org-link of type
voicenote
, which, when followed, plays the original audio file in EMMS - Google’s transcript of the note, broken down into 10-second segments. Each segment begins with a
voicenote
link which will play the original audio file at the time offset which corresponds to that segment.
This uses Google’s Cloud Speech-to-Text API, and you will need your own GCP account. Make sure you have a service account JSON.
Other than that, you’ll need python3
and ffmpeg
on your system. Only tested on Arch Linux.
Clone this repo, then install like so:
git clone https://github.com/bgutter/voicenotes2org
cd voicenotes2org
pip install . # optionally with sudo, depending on your system
It’s also on PyPI as voicenotes2org
, but not usually up to date there.
sudo pip install voicenotes2org
Transcription jobs can be defined on the command line, or in a config file.
CLI Example:
> voicenotes2org --voice_notes_dir=~/new-voice-notes/ --archive_dir=~/org/archived-voice-notes/ --org_transcript_file=~/org/unfiled-voice-notes.org
…or…
Config File:
> cat ~/.config/voicenotes2org/default.toml
voice_notes_dir="~/new-voice-notes/"
archive_dir="~/org/archived-voice-notes/"
org_transcript_file="~/org/unfiled-voice-notes.org"
> voicenotes2org
Note that, in the config file, all relative paths will be interpreted as relative to the config file. For example, “filename_regex.txt” in ~/.config/voicenotes2org/default.toml
will be treated as ~/.config/voicenotes2org/filename_regex.txt
.
In both case, the script will find every WAV file in ~/new-voice-notes/
, and transcribe them. After transcription, they will be moved to ~/org/archived-voice-notes/
. If ~/org/unfiled-voice-notes.org
does not exist, it will be created with an eval header statement which defines the voicenote
link type. If the file already exists, voicenotes2org will only append content, leaving existing content unmodified. There will be one new heading for each WAV file transcribed.
Optional Arguments
Option | Meaning |
---|---|
--gcp_credentials_path | Path to JSON file. If provided, use this to access the Google Speech-to-Text API. If missing, you must have configured the GOOGLE_APPLICATION_CREDENTIALS environment variable! |
--voicenote_filename_regex_path | Path to a text file containing a Python regex which will be used to match and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm. This is an unsanitized input. Be smart. |
--max_concurrent_requests | Maximum number of concurrent transcription requests. |
--verbose | Default false. Print the name of WAV files currently being transcribed. |
--just_copy | Boolean. Default false. If true, don’t remove audio from original folder. |
If you prefer to avoid eval statements in your file headers, you may instead include this somewhere in your init code:
(org-link-set-parameters "voicenote"
:follow (lambda (content)
(cl-multiple-value-bind (file seconds)
(split-string content ":")
(emms-play-file file)
(sit-for 0.5)
(emms-seek-to (string-to-number seconds)))))
Formatted Output:
Plain Text:
# -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to seconds)))) -*-
#+TITLE: Unfiled Voice Notes
C-c C-o on any link to play clip starting from that offset.
* New Voice Note
[2020-01-01 Wed 00:52]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][Archived Clip]]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][00:00]] this is a second voice note I am talking into a phone right now roses are red violets are blue
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:10][00:10]] blah blah blah
* New Voice Note
[2020-01-01 Wed 00:52]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][Archived Clip]]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][00:00]] this is a voice note for testing this is the first one that I will do I'm going to talk about nothing
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:10][00:10]] because I don't know what else to say
* New Voice Note
[2020-01-01 Wed 00:53]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][Archived Clip]]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][00:00]] Mona Lisa lost her smile the painters hands are trembling now and if she's out
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:10][00:10]] there running wild it's just because I taught her how the Masterpiece that we had planned is laying shattered
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:20][00:20]] on the ground Mona Lisa lost her smile and the painters hands are trembling now and the eyes that used to burn for
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:30][00:30]] me now they no longer look my way and the love that used to be why it just got lost in yesterday
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:40][00:40]] and if she seems cold to the touch well there used to be burn a flame I gave to a little took
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:50][00:50]] too much til I erased the painter's name ... too much till I erased the painter's name
Unless you define your own regex file, WAV files must be named according to the following pattern:
.* YYYY-MM-DD H-MM AM|PM .*.wav
Where:
YYYY
is the year.MM
is zero-padded month.DD
is zero-padded day.H
is unpadded (sorry) hour in 12-hour format.MM
is zero-padded minute.AM|PM
is literally just “AM” or “PM”.- Everything is whitespace delimited.
Many corners have been cut in the making of this script. If literally anyone else ever uses this code, these issues might be worth fixing some day.
Wouldn’t be hard to figure out the file format, but Google’s transcription API requires non-WAV formats specify things like sample rate and encoding. I did not need this.
Google caps the duration of audio which has been inlined into the transcription request at 1 minute. Anything longer than that, and you need to configure a Google Cloud Storage bucket. I didn’t want to, so I split each voice note into 55-second chunks with a 5-second overlap.
For example, a 3 minute long voice note is actually transcribed in 4 separate chunks:
- 0:00 to 0:55 – 55 seconds
- 0:50 to 1:45 – 55 seconds, first 5 overlap
- 1:40 to 2:35 – 55 seconds, first 5 overlap
- 2:30 to 3:00 – 30 seconds, first 5 overlap
To reduce (or, maybe produce) confusion, I insert an ellipsis (…) into the transcription wherever we’re about to start inserting overlapped content. For example:
and we went to the store for some ... the store for some candy to bring with us
This is ugly and lazy and later versions might improve this.
This is how I integrate my voice recordings into org-mode.
Convenient Voice Recording
I record voice notes on my Android device using “Easy Voice Recorder”. I use this app specifically because it provides a system shortcut to toggle recording. The first invocation of this shortcut begins recording, and the second stops recording, saving the audio to a new WAV file. A third invocation would start recording again, but with another new file.
This app also lets you specify how audio files should be named, which makes it easy to encode date and time.
Most importantly, I use the “Button Mapper” app to bind a long-press of the volume-up key to this shortcut. This works even when the screen is off.
With this setup, ideas, tasks, and notes can be recorded instantly and effortlessly. Just long hold the volume up key, say whatever needs to be said, and long hold again to complete the file. No unlocking the phone, and no interacting with the touchscreen.
Alternatively, If you don’t mind carrying a second device, a dedicated voice recorder would work at least as well.
Syncing The Audio Files
I use Syncthing to sync the voice notes directory on my Android device to a directory on my PC. This is probably the easiest way to achieve near realtime syncing, and Syncthing is FOSS!
Alternatively, you can manually copy the files every evening over USB, or SSH, or Google Drive, or…well, you get the idea.
Transcription
In my org directory structure, I have a file dedicated to receiving transcribed, but not yet properly filed, voice notes. Let’s say that this is at ~/org/unfiled-voice-notes.org
. Let’s also assume that my untranscribed voice notes are synced – by Syncthing – to ~/new-voice-notes/
.
If I run the example command under the Basic Usage
heading, then absent any errors, ~/new-voice-notes/
will be cleared out. This frees up space on the phone, though otherwise isn’t all that important. What is important is that, for each processed audio file, a new heading will appended to ~/org/unfiled-voice-notes.org
. The audio file will now live in ~/org/archived-voice-notes/
, and any file links in the org entries will point to this location. Because the links are absolute, the headings can be moved around wherever you’d like and will not break.
Filing
Once voicenotes2org
has returned, you should open ~/org/unfiled-voice-notes.org
in Emacs, then use org-refile
to pop each entry into a more proper location in your org directory structure. Make sure you’ve configured org-refile-targets
first!