dmzoneill/lidarr-youtube-downloader

REQ: Ability to set percentage match cutoff and/or keyword blacklist such as "live" or whitelist "official" in title.

Opened this issue · 6 comments

I've gotten some bad matches on songs...typically someone recording the singer as a live concert. I'd like to have it only accept high percentage matches or whitelist things like the word "Official" or blacklist the word "Live" in the title.

the code uses lechentein distance to match names and titles, with a match ratio greater than 0.8.

Unfortunately this is never going to be perfect, increasing the match ratio can possibly get you better results.

But consider the following:

if i record 2 songs.

  1. official song off the CD
  2. recording of me singing it in the shower.

If i then upload both of them to youtube with the same name:

dave - the great title
dave - the great title

Which 1 is correct based off the name? the problem here is user input is always at fault.

If you have some way of making better determination, i can certainly implement it for you :)

The problem is I'm getting alot of matches where it IS the artist singing the song but it's at some club where you can barely hear them and a bunch of people are talking over them....I'm then having to go delete that particular track....if I ever run ydl again it's going to grab the same track presumably...I'd rather set the ratio to .9 minimum and/or ignore tracks with "live" in the title. I don't see why allowing user set conditions would be a problem? Perhaps via environment variables.

ill take a look today at adding some flexibility around matching.
But it wont solve the problem for you.

the code actually does the following already

  1. searches
  2. iterates the search results checking the names for leventstein distance
  3. each search result gets a match ratio, eg 0.95
  4. from all the matches it picks the one with the highest match ratio
  5. finaly only accepts it if the match is greater than 0.8

ill add the option to increase the 0.8.
but i'm 100% sure you will still have people on youtube uploading bootleg rips, live gigs, karaoke, whatever as perfectly named "artist - track name" < that will always match the search 100% perfect.

One solution is to use a service like audiotag.info (based on a database of audio spectrograms) to attempt to identify some extracts from the downloaded YouTube video. Perhaps you could have multiple samples extracted from the same song, say, at the start, middle, and ending and based on some rules, ie: if all samples could be matched then it is highly likely to be a YouTube video with just the song itself and not someone's karaoke party. I see audiotag.info has an API, or maybe there are better alternatives, but it should be possible to narrow down the quality of a downloaded YouTube song a little better than matching just metadata which is what Lidarr itself seems to be doing.

Could definitely look into such a solution :)
PR's are also welcome.
I'm working on a number of things currently.