TecCheck/FastLyrics

Use Deezer lyrics with time stamp

marsmathis opened this issue · 20 comments

Hi, just a suggestion (no idea whether this would strictly be allowed but I know it works technically):

You could source the lyrics from Deezer's API. The cool thing is that they have timestamped lyrics for many songs so you can parse the API response into a .LRC compliant file and then you could display the lyrics in sync with the song if you can read the current playback location from the notification. I have written the parser completely already and it works beautifully (and you don't even need an API key because that part of the API is accessible via their anonymous API call feature). There can obviously always be fallback display modes similar or identical to how it works currently but I think it would be cool to have an offline version of a service like Musixmatch or the like.

Let me know if you want to see the code and then let's talk about implementing it if you like it.

That sounds awesome. I'd love to have that in FastLyrics

That sounds awesome. I'd love to have that in FastLyrics

Okay, so I’m gonna go through my code for generating LRC files from Deezer’s API response.

import requests
import json
from difflib import SequenceMatcher

We use requests for yoinking stuff from the internet, json for parsing and SequenceMatcher from difflib for analyzing how close the title of our song is to each search result on the API.

title = "Toxicity"

artist = "System of a Down"

The specific strings are just hardcoded for now, you could obviously easily set those values from somewhere else.

info_raw = requests.get("https://api.deezer.com/search?q=" + title.lower() + "+" + artist.lower())

info = json.loads(info_raw.text)

Here we just request a search via the API for our title and artist. We then parse the JSON response with json.loads. See https://api.deezer.com/search?q=toxicity+system%20of%20a%20down for an example of the structure of the response.

results = []
for i, result in zip(range(len(info["data"])), info["data"]):
    if SequenceMatcher(None, result["title"], title).ratio() > 0.8 and SequenceMatcher(None, result["artist"]["name"], artist).ratio() > 0.8:
        results.append(i)

This is kind of work-in-progress, here we check the similarity of each of the results within the response with our given strings. For example, if our search query from above is requested, we receive "Toxicity" by System of a Down as the first result, but we also get all other songs from the album of the same name as subsequent results. I haven’t conclusively checked whether a song other than “Toxicity” ever comes up as the first result when requesting something like this but I wanted to make sure that if it does swap things around, we save the index of the correct result so we don’t mismatch lyrics.

authorization_token_url = 'https://auth.deezer.com/login/anonymous?jo=p'

authorization_token_response = requests.get(authorization_token_url)

authorization_token = json.loads(authorization_token_response.text)

Deezer uses an authentication workflow which normally requires an API key. However, they have this special “anonymous” login routine where you just request an authorization token as anonymous. This key obviously doesn’t let you play back music, but it does allow us to receive the lyrics which is what we’re here for.

api_headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0",
        "Accept": "*/*",
        "Accept-Language": "en-US",
        "Content-Type": "application/json",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-site",
        "Sec-GPC": "1",
        "authorization": "Bearer " + authorization_token["jwt"]}

Here we just set the API POST headers including our anonymous login auth token we received earlier. I just went for a pretty random User-Agent, I’m sure you could change it if need be.

api_body = {"operationName":"SynchronizedTrackLyrics","variables":{"trackId":str(info["data"][results[0]]["id"])},"query":
"""query SynchronizedTrackLyrics($trackId: String!) {
  track(trackId: $trackId) {
    title
    ...SynchronizedTrackLyrics
    __typename
  }
}
fragment SynchronizedTrackLyrics on Track {
  id
  lyrics {
    ...Lyrics
    __typename
  }
  __typename
}
fragment Lyrics on Lyrics {
  id
  copyright
  writers
  synchronizedLines {
    ...LyricsSynchronizedLines
    __typename
  }
  __typename
}
fragment LyricsSynchronizedLines on LyricsSynchronizedLine {
  lrcTimestamp
  line
  lineTranslated
  milliseconds
  duration
  __typename
}
"""}

This is the actual API POST body where we tell Deezer what we actually want from the API. It’s been a while since I wrote this and I have to be honest, I don’t remember every detail. I’d have to look into the API docs to rebuild my memory on this completely.

api_response_raw = requests.post(api_url, headers = api_headers, json = api_body)

api_response = json.loads(api_response_raw.text)

This is just calling and JSON parsing the API response again.

with open(artist + " – " + title + ".lrc", "w+") as myfile:
    myfile.write("[ti: " + info["data"][results[0]]["title"] + "]" + "\n")
    myfile.write("[al: " + info["data"][results[0]]["album"]["title"] + "]" + "\n")
    myfile.write("[ar: " + info["data"][results[0]]["artist"]["name"] + "]" + "\n")
    for line in api_response["data"]["track"]["lyrics"]["synchronizedLines"]:
        myfile.write(line["lrcTimestamp"] + line["line"] + "\n")

And here we finally build the .LRC file. LRC files have the following structure:

[ar:Lyrics artist]

[al:Album where the song is from]

[ti:Lyrics (song) title]

[au:Creator of the Songtext]
[length:How long the song is]
[by:Creator of the LRC file]

[offset:+/- Overall timestamp adjustment in milliseconds, + shifts time up, - shifts down i.e. a positive value causes lyrics to appear sooner, a negative value causes them to appear later]
 
[re:The player or editor that created the LRC file]

[ve:version of program]

[00:12.00]Line 1 lyrics
[00:17.20]Line 2 lyrics

[00:21.10][00:45.10]Repeating lyrics (e.g. chorus)
...
[mm:ss.xx]last lyrics line

Credit to Wikipedia for this example.

In this case we just write the title, album, and artist as ID tags and then we put each lyric line in a new line just as the file standard expects it.

So now we have a fully compliant LRC file for any song we wish, as long as Deezer has synchronized lyrics for that song. I have not checked yet what happens when a song does not have those synchronized lyrics (I expect either the Python kernel is gonna nope out with some error or it’s just gonna populate the LRC with the ID tags and leave the rest empty) but it’d be pretty trivial to just fall back to Genius or something to show static lyrics as the app currently does whenever the Deezer flow doesn’t deliver the expected result. I’m sure we could also write the code more nicely (this was written very quickly and dirtily some time ago because I wondered whether I could do it at all) and we could put maybe put fallback logic in place. For example, when we loop through every response and record all response indices where title and artist both match at least 80 %, it’s not impossible for there to be multiple results where this is true (in fact, for Toxicity by SoaD there are two results where it’s true, one on the album Toxicity and one on Rock Clássico with index 15). We could loop through the results list in case the LRC is empty. That’s just one possibility to make this more robust.

Let me know if you have any questions and please do tell me if I can help you in any way. I have never worked on any Android app at all but I’d be willing to learn for sure.

That looks quite nice. Obviously we'd need to rewrite this in Java or Kotlin but that shouldn't be too hard. I really like the Idea of the diff tool to find the best result. I think it would be nice to implement this for all sources.

Do you want to create a fork and a pull request? That way we could work on this together (if you like). I think GitHub allows pull request with no changes

Honestly, I have no idea how any of this works. I have never used GitHub except to store my own code. Also, I have literally zero experience in writing Kotlin or Java.

We should also have a look whether Deezer has any licensing shenanigans going on for their lyrics. So far I have only been using this code to make some LRC files for my own private use but if we implement this in a public app maybe we should double check this.

Hm that's fine. Then I guess I'll code that myself. However if you are in interested I invite you to give input on it (once I did some work of course). Afterall it's your idea.

Oh and by the way I think that would be a good opportunity to learn how to do contributions. I'm also not an expert on any of that (just learned through trial and error)

Small update: I just checked what happens when you request the lyrics of a song which doesn’t have synchronized lyrics. It nopes out with an error as I expected and just writes the ID tags into the file. Then I also checked what happens when Deezer has lyrics but they aren’t synced. It also errored out. However I have changed the API body and put the whole “writing into file” part into a try...except statement so it tries to write the synced lyrics and if it errors, it goes into another try...except where it tries to write the non-timestamped lyrics into a txt file and if that doesn’t work it just has a pass.

Just to be clear: I can't integrate Python code into the code base so you don't need to do this for FastLyrics

I know, this was just out of curiosity for my own code.

Oh and by the way I think that would be a good opportunity to learn how to do contributions. I'm also not an expert on any of that (just learned through trial and error)

Sounds like a great idea, I have no idea where to start learning about this though. Unless you want to teach me via Discord or something, hahaha (I’m German too btw)... I might just read up on some basics regarding GitHub tomorrow.

I'm ok with both

api_response_raw = requests.post(api_url, headers = api_headers, json = api_body)

What url is this api_url?

api_url = 'https://pipe.deezer.com/api'

Sorry, forgot to copy it when writing up the explanation above.

Have you given any amount of thought towards the issue of how you will display the lyrics in time with the music itself? I don’t know how often you poll the notifications (as far as I can see there is a setting for auto refresh, how frequently does it do that?). Also, I just thought about this: You are saving the downloaded lyrics in a database, right? What is the data type of your saved lyrics? Is it just multi-line strings? Instead of using LRC as an actual file type you could make your own file type which adheres to the LRC standard but maybe you extend it by specifying on the first line whether the file has timestamps or not. Because then you could have both unsynced and synced lyrics in the same file format.

So basically

[sync: false]
[ti: ...]
[al: ...]
[ar: ...]
Lyrics line 1
Lyrics line 2
...


[sync: true]
[ti: ...]
[al: ...]
[ar: ...]
[00:13.59]Lyrics line 1
[00:18.46]Lyrics line 2
...

And sorry for the comment spam, but I just had another thought: You might have to come up with some sort of unique identifier (perhaps based on track length?) because sometimes there are different versions of songs with identical titles and then the lyrics would be mismatched in terms of sync. Irrespectively, you should have a toggle button in the UI anyway to switch between synced and non-synced lyrics versions.

I've created a pull request with a new branch. This will be where all the commits land until everything gets merged into master at once

Looks good even though I don’t understand most of the code.

Wjxfi commented

?

As of 0.5.0 this is implemented. Hope you like it

There are a couple bugs I’ve noticed. Should I open a new issue or use this one?

New issues are better