Plugin for beets that uses supplemental files in imported directories to improve MusicBrainz matches for untagged data.
Whenever beets tries to identify your music, it queries MusicBrainz using tags from your music files. The query returns only the few best matches of the many possible results, however, so the better your tags, the more likely you are to get a good match.
But one of the reasons we're using beets to begin with is to get those tags; music is often tagged with only the most essential data (i.e., album and artist), lacking tags that could actually identify the specific edition of an album. That puts us in a chicken-and-egg situation: beets can't accurately identify the release until it has relevant tags, but it can't assign relevant tags without knowing the release!
In other words, while beets is an excellent tool, it's only as useful as the data it has available. It's common to store extra data in separate text or JSON files, but that data isn't read by beets as it's not included in music files themselves. If only there were a way to feed this origin data to beets to supplement our tags to accurately identify editions…
Enter originquery
.
This plugin relies on cutting-edge beets features to work. In particular, your beets installation must support the
extra_tags
setting, which is not yet in
an official beets release. Until beets v1.5.0 is released with the commit adding this
feature, you must install the latest
beets manually:
$> pip install git+https://github.com/beetbox/beets
$> beet --version
beets version 1.5.0
Once you have the latest and greatest beets, you can install this plugin:
$> pip install git+https://github.com/x1ppy/beets-originquery
Next, add the following section to your beets config file to enable improved MediaBrainz queries from tags:
musicbrainz:
extra_tags: [year, catalognum, country, media, label]
Finally, add originquery
to the plugins
section of your beets config file, creating it if it doesn't exist:
plugins:
- originquery
originquery
reads an origin file at the root of each album directory when music is imported. The origin file can be
either a text, JSON, or YAML file. Beyond that, the format of the file is entirely user-defined and is specified in the
originquery
configuration.
Your beets configuration must contain a section with the following fields:
originquery:
origin_file: <origin_file_name>
tag_patterns:
<tag1>: <pattern1>
<tag2>: <pattern2>
The origin_file
supports glob wildcard characters. So, for instance, if you use a date scheme for your origin file
naming (e.g., origin-2020025.txt
), you could specify origin_file: 'origin-*.txt'
here. If the pattern matches
multiple files, the first file in the alphanumerically sorted list of results will be used.
The tags under tag_patterns
can be any combination of the following tags:
media
(CD, vinyl, …)year
(edition year, not original release year)country
(US, Japan, …)label
(Epic, Atlantic, …)catalognum
(ABC-XYZ, 102030, …)albumdisambig
(Remastered, Deluxe Edition, …)
By default, the origin file parser will be determined by its file extension. .yaml
will be parsed as YAML, .json
will be parsed as JSON, and generic text parsing will be used otherwise. If your file format doesn't match its
extension, you can override the file type with the origin_type
config option, setting the type to either yaml
,
json
, or text
:
originquery:
...
origin_type: <yaml|json|text>
The patterns used will depend on your origin file type as outlined below.
When using text origin files, the tag_patterns
pattern must be a regular expression containing a single match group
corresponding to the value for the given tag.
As an arbitrary example, you might have origin files that look like the following:
media=CD
year=1988
label=Mobile Fidelity Sound Lab
catalognum=UDCD 517
In this case, your beets config would look like this:
originquery:
origin_file: origin.txt
tag_patterns:
media: 'media=(.+)'
year: 'year=(\d{4})'
label: 'label=(.+)'
catalognum: 'catalognum=(.+)'
This means that you have a file named origin.txt
at the root of each album directory, and the media
, year
, label
,
and catalognum
tags will be parsed from this file and used by beets to query MusicBrainz. In this case, each tag and
value would be listed in the origin file separated by an =
(i.e., <tag>=<value>
) as shown in the example. Of course,
you're free to use a completely different formatting scheme if you update the patterns accordingly.
With JSON origin files, the tag_patterns
pattern must be a JSONPath
expression that points to the value for the given tag.
As an arbitrary example, you might have origin files that look like the following:
{
"mydata": {
"media": "CD",
"year": 1988,
"label": "Mobile Fidelity Sound Lab",
"catalognum": "UDCD 517"
}
}
In this case, your beets config would look like this:
originquery:
origin_file: origin.json
tag_patterns:
media: '$.mydata.media'
year: '$.mydata.year'
label: '$.mydata.label'
catalognum: '$.mydata.catalognum'
This means that you have a file named origin.json
at the root of each album directory, and the media
, year
, label
,
and catalognum
tags will be parsed from this file and used by beets to query MusicBrainz. In this case, the tag and
value mappings would be defined in an object literal under the mydata
key at the root of the object as shown in the
example. Of course, you're free to use a completely different schema if you update the patterns accordingly.
Like JSON origin files, YAML files use JSONPath for tag_patterns
.
For example, with origin files that look like the following:
mydata:
media: CD
year: 1988
label: Mobile Fidelity Sound Lab
catalognum: UDCD 517
you would use the same JSONPath-based tag_patterns
config as JSON files (see above). origin_file
would of course
point to origin.yaml
instead of origin.json
.
Just as a baseline, let's first try a beets import without originquery
. We'll import this
edition of Led Zeppelin's Houses of the Holy:
$> beet import ~/music
/home/x1ppy/music/(1973) Houses Of The Holy [2014 Remaster] (8 items)
Correcting tags from:
Led Zeppelin - Houses Of The Holy
To:
Led Zeppelin - Houses of the Holy
URL:
https://musicbrainz.org/release/3ccb4cb2-940a-4e2e-b1fd-4c0b7483280f
(Similarity: 100.0%) (12" Vinyl, 1973, US, Atlantic, SD 7255)
* The Song Remains The Same -> The Song Remains the Same
* Over The Hills And Far Away -> Over the Hills and Far Away
* D'yer Mak'er -> D’yer Mak’er
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort?
Nice, 100%! A perfect match…or is it?
On closer inspection, you'll notice that this is actually a very different edition than the one we're importing. beets is reporting the media as 12" Vinyl (instead of CD), the edition year is 1973 (instead of 2014), and the catalog number is different. No bueno.
Now, let's compare that to a query with originquery
enabled:
$> beet import ~/music
/home/x1ppy/music/(1973) Houses Of The Holy [2014 Remaster] (8 items)
originquery: Using origin file /home/x1ppy/music/(1973) Houses Of The Holy [2014 Remaster]/origin.txt
originquery: ╔════════════════╤═════════════╤═════════════╗
originquery: ║ Field │ Tagged Data │ Origin Data ║
originquery: ╟────────────────┼─────────────┼─────────────╢
originquery: ║ Media │ │ CD ║
originquery: ║ Edition year │ 1973 │ 2014 ║
originquery: ║ Record label │ │ Atlantic ║
originquery: ║ Catalog number │ │ 8122795828 ║
originquery: ║ Edition │ │ Remastered ║
originquery: ╚════════════════╧═════════════╧═════════════╝
Correcting tags from:
Led Zeppelin - Houses Of The Holy
To:
Led Zeppelin - Houses of the Holy
URL:
https://musicbrainz.org/release/51a4e8b4-1af1-4daf-a746-ac1c7206dd02
(Similarity: 100.0%) (CD, 2014, XE, Atlantic, 8122795828)
* The Song Remains The Same -> The Song Remains the Same
* Over The Hills And Far Away -> Over the Hills and Far Away
* D'yer Mak'er -> D’yer Mak’er
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort?
Another 100% match! This time, though, all of the fields reported by beets exactly match the ones we were looking for. Success!
You'll also notice the shiny new table shown with the beets result. This gives us a handy reference for tags: the Tagged
Data column lists the data beets found in the music files, and the Origin Data column lists the data originquery
pulled from the origin file. With this information on hand, it's now clear why beets wasn't able to match the proper
edition: the music tags don't contain any tags that could actually identify the specific release!
Occasionally, you might see originquery
complain about conflicts between tagged data and origin data:
/home/x1ppy/music/Billy Joel - 1978 - 52nd Street (9 items)
originquery: Using origin file /home/x1ppy/music/Billy Joel - 1978 - 52nd Street/origin.txt
originquery: ╔════════════════╤═════════════╤════════════════╗
originquery: ║ Field │ Tagged Data │ Origin Data ║
originquery: ╟────────────────┼─────────────┼────────────────╢
originquery: ║ Media │ │ CD ║
originquery: ║ Edition year │ 1978 │ 2010 ║
originquery: ║ Record label │ Columbia │ Audio Fidelity ║
originquery: ║ Catalog number │ IDK 35609 │ AFZ 095 ║
originquery: ╚════════════════╧═════════════╧════════════════╝
originquery: Origin data conflicts with tagged data.
Tagging:
Billy Joel - 52nd Street
URL:
https://musicbrainz.org/release/6942718c-2fd2-4227-a882-130c500806f5
(Similarity: 100.0%) (CD, 1978, CA, Columbia, IDK 35609)
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort?
This happens if either the music is mistagged or the origin file contains the wrong origin data. Here, we see the tagged
catalog number is IDK 35609
, but the origin data is AFZ 095
. These are clearly different editions, and it wouldn't
make sense to try to merge them to search MusicBrainz, so originquery
chooses just one set of values to query
MusicBrainz and ignores the other.
By default, originquery
uses the tagged data in the case of a conflict. This behavior can be changed by setting
use_origin_on_conflict
to yes
in the beets config:
originquery:
...
use_origin_on_conflict: yes
1.0.2 - 2020-04-06
- Added support for YAML origin files
1.0.1 - 2020-03-25
- Added support for glob patterns in
origin_file
1.0.0 - 2020-03-23
- Initial release