ewenme/geniusr

get_lyrics_id issue

Closed this issue ยท 13 comments

Whenever I request the lyrics for a song, I only get a blank tibble instead of a tibble with the lyrics. This is also a problem with get_lyrics_search. Other functions like search_song and get_song work just fine though. The output is shown below. Thank you so much for your help.

get_lyrics_id(song_id=50158)
A tibble: 0 x 6
... with 6 variables: line , section_name , section_artist , song_name , artist_name , song_id

same issue

I created this patch to fix this issue on my local box.

4c4
<   lyrics <- html_nodes(session, ".lyrics p")
---
> #  lyrics <- html_nodes(session, ".lyrics p")
5a6,8
> # edit 11/21/2021
>   lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')#
>
7,8c10,11
<   song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
<     html_text()
---
> #  song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
> #    html_text()
10,11c13,22
<   artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
<     html_text()
---
> # edit 11/21/2021
>   song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>%
>     html_text(trim = TRUE)
>
> #  artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
> #    html_text()
>
> # edit 11/21/2021
>     artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>%
>       html_text(trim = TRUE)

I created this patch to fix this issue on my local box.

4c4
<   lyrics <- html_nodes(session, ".lyrics p")
---
> #  lyrics <- html_nodes(session, ".lyrics p")
5a6,8
> # edit 11/21/2021
>   lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')#
>
7,8c10,11
<   song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
<     html_text()
---
> #  song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
> #    html_text()
10,11c13,22
<   artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
<     html_text()
---
> # edit 11/21/2021
>   song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>%
>     html_text(trim = TRUE)
>
> #  artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
> #    html_text()
>
> # edit 11/21/2021
>     artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>%
>       html_text(trim = TRUE)

I am not much of a coder. How do you apply this patch?

get_lyrics <- function (session) {
  lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')
  song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>% html_text(trim = TRUE)
  artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>% html_text(trim = TRUE)
  xml_find_all(lyrics, ".//br") %>% xml_add_sibling("p", "\n")
  xml_find_all(lyrics, ".//br") %>% xml_remove()
  lyrics <- html_text(lyrics, trim = TRUE)
  lyrics <- unlist(strsplit(lyrics, split = "\n"))
  lyrics <- grep(pattern = "[[:alnum:]]", lyrics, value = TRUE)
  if (is_empty(lyrics)) {
    return(tibble(line = NA, section_name = NA, section_artist = NA, 
                  song_name = song, artist_name = artist))
  }
  section_tags <- nchar(gsub(pattern = "\\[.*\\]", "", lyrics)) == 0
  sections <- geniusr:::repeat_before(lyrics, section_tags)
  sections <- gsub("\\[|\\]", "", sections)
  sections <- strsplit(sections, split = ": ", fixed = TRUE)
  section_name <- sapply(sections, "[", 1)
  section_artist <- sapply(sections, "[", 2)
  section_artist[is.na(section_artist)] <- artist
  tibble(line = lyrics[!section_tags], section_name = section_name[!section_tags], 
         section_artist = section_artist[!section_tags], song_name = song, 
         artist_name = artist)
}
assignInNamespace("get_lyrics", get_lyrics, "geniusr")

@MalcolmMashig - thanks so much for the fix above! You should submit as a PR, I was also having the same issue.

Implemented the fix following @MalcolmMashig. As far as I understand, this creates a new function get_lyrics, and I don't understand how to use it. I tried these, but they give errors about unused arguments.

a <- get_lyrics(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
b <- get_lyrics(song_id = 3039923)
c <- get_lyrics(artist_name = "Anderson .Paak", song_title = "Come Home")

The old functions get_lyrics_id, get_lyrics_url and get_lyrics_search still return empty dataframes.

aa <- get_lyrics_id(song_id = 3039923)
ab <- get_lyrics_url(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
ac <- get_lyrics_search(artist_name = "Anderson .Paak", song_title = "Come Home")

@mattroumaya how did you do it?

Implemented the fix following @MalcolmMashig. As far as I understand, this creates a new function get_lyrics, and I don't understand how to use it. I tried these, but they give errors about unused arguments.

a <- get_lyrics(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
b <- get_lyrics(song_id = 3039923)
c <- get_lyrics(artist_name = "Anderson .Paak", song_title = "Come Home")

The old functions get_lyrics_id, get_lyrics_url and get_lyrics_search still return empty dataframes.

aa <- get_lyrics_id(song_id = 3039923)
ab <- get_lyrics_url(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
ac <- get_lyrics_search(artist_name = "Anderson .Paak", song_title = "Come Home")

@mattroumaya how did you do it?

Hey @elinevisser23 ! Not the one who wrote the patch, but I was able to implement it. So, it's actually overwriting one of the functions not imported (check out ?assignInNamespace for more info). If you look at lyrics.r, you'll see the function it's replacing right there on line 1. You could probably just run @MalcolmMashig code as is in the console. You might have to load a couple additional packages/dependencies (e.g. rvest and xml2). After running it in console, you should be able to use the get_lyrics_* functions as normal with their typical arguments.

Alternatively, depending on your use case you could paste it into a separate file, say lyric_patch.R and load it into whatever you're using it for with source("lyric_patch.R").

Hope this helps!

patch doesn't appear to work anymore, getting this error

get_lyrics_id(song_id = 904479)

Error in section_artist[is.na(section_artist)] <- artist : 
  replacement has length zero

@mulderc

Change these two lines in the patch above and see what you get. Haven't had time to thoroughly check, but I was able to pull lyrics.

song <- session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderdesktop__")]') %>% html_text(trim = TRUE) artist <- session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderdesktop__Artist")]') %>% html_text(trim = TRUE)

Changing

SongHeaderVariantdesktop__ -----> SongHeaderdesktop__

closed by #18. thanks everyone! :)

hey guys,

i tried the patch by @MalcolmMashig and the changing of the lines suggested by @morosophist.But it still doesnt work. anyone knows whats up with that :(?

thanks in advance!

This is an old issue but I'm going to post the fix for everyone still struggling with it!
I was facing the same problem as @naddafli and basically you also need to replace this:

artist <- session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderdesktop__Artist")]') %>% html_text(trim = TRUE)

with this:

artist <- session %>% html_nodes(xpath = '//a[contains(@class, "HeaderArtistAndTracklistdesktop__Artist")]') %>% html_text(trim = TRUE)

Afraid the get_lyrics commands are broken again w.r.t. the artist data - does anyone have a fix? And a way to find out when it will need fixing again? the Genius API pages don't make their changes exactly plain...