/geniusR

Easily access song lyrics from Genius in a #tidy format.

Primary LanguageR

---
title: "Quickstart: geniusR"
author: "Josiah Parry"
date: "2/12/2018"
---

# Overview

This package was created to provide an easy method to access lyrics as text data using the website [Genius](genius.com). 
  
## Installation  
This package must be installed from GitHub.

```{r, eval = F}
devtools::install_github("josiahparry/geniusR")
```

Load the package:
```{r}
library(geniusR)
library(tidyverse) # For manipulation
```


# Getting Lyrics

## Whole Albums

`genius_album()` allows you to download the lyrics for an entire album in a `tidy` format. There are two arguments `artists` and `album`. Supply the quoted name of artist and the album (if it gives you issues check that you have the album name and artists as specified on [Genius](https://genius.com)).

This returns a tidy data frame with three columns:
  
  - `title`: track name
  - `track_n`: track number
  - `text`: lyrics


```{r}
emotions_math <- genius_album(artist = "Margaret Glaspy", album = "Emotions and Math")
emotions_math
```


## Multiple Albums

If you wish to download multiple albums from multiple artists, try and keep it tidy and avoid binding rows if you can. We can achieve this in a tidy workflow by creating a tibble with two columns: `artist` and `album` where each row is an artist and their album. We can then iterate over those columns with `purrr:map2()`.

In this example I will extract 3 albums from Kendrick Lamar and Sara Bareilles (two of my favotire musicians). The first step is to create the tibble with artists and album titles. 

```{r}
albums <-  tibble(
  artist = c(
    rep("Kendrick Lamar", 3), 
    rep("Sara Bareilles", 3)
    ),
  album = c(
    "Section 80", "Good Kid, M.A.A.D City", "DAMN.",
    "The Blessed Unrest", "Kaleidoscope Heart", "Little Voice"
    )
)

albums
```

No we can iterate over each row using the `map2` function. This allows us to feed each value from the `artist` and `album` columns to the `genius_album()` function. Utilizing a `map` call within a `dplyr::mutate()` function creates a list column where each value is a `tibble` with the data frame from `genius_album()`. We will later unnest this. 

```{r}
## We will have an additional artist column that will have to be dropped
album_lyrics <- albums %>% 
  mutate(tracks = map2(artist, album, genius_album))

album_lyrics
```


Now when you view this you will see that each value within the `tracks` column is `<tibble>`. This means that that value is infact another `tibble`. We expand this using `tidyr::unnest()`.


```{r}
# Unnest the lyrics to expand 
lyrics <- album_lyrics %>% 
  unnest(tracks) %>%    # Expanding the lyrics 
  arrange(desc(artist)) # Arranging by artist name 

head(lyrics)
```


## Song Lyrics

### `genius_lyrics()`

If you want only a single song, you can use `genius_lyrics()`. Supply an artist and a song title as character strings, and voila.

```{r}
memory_street <- genius_lyrics(artist = "Margaret Glaspy", song = "Memory Street")

memory_street
```

This returns a `tibble` with three columns `title`, `text`, and `line`. However, you can specifiy additional arguments to control the amount of information to be returned using the `info` argument. 

  - `info = "title"` (default): Return the lyrics, line number, and song title.
  - `info = "simple"`: Return just the lyrics and line number.
  - `info = "artist"`: Return the lyrics, line number, and artist.
  - `info = "all"`: Return lyrics, line number, song title, artist.


## Tracklists

`genius_tracklist()`, given an `artist` and an `album` will return a barebones `tibble` with the track title, track number, and the url to the lyrics. 

```{r}
genius_tracklist(artist = "Basement", album = "Colourmeinkindness") 
```

## Nitty Gritty

`genius_lyrics()` generates a url to Genius which is fed to `genius_url()`, the function that does the heavy lifting of actually fetching lyrics. 

I have not figured out all of the patterns that are used for generating the Genius.com urls, so errors are bound to happen. If `genius_lyrics()` returns an error. Try utilizing `genius_tracklist()` and `genius_url()` together to get the song lyrics. 

For example, say "(No One Knows Me) Like the Piano" by _Sampha_ wasn't working in a standard `genius_lyrics()` call. 

```{r, eval = FALSE}
piano <- genius_lyrics("Sampha", "(No One Knows Me) Like the Piano")
```

We could grab the tracklist for the album _Process_ which the song is from. We could then isolate the url for _(No One Knows Me) Like the Piano_ and feed that into `genius_url().

```{r}
# Get the tracklist for 
process <- genius_tracklist("Sampha", "Process")

# Filter down to find the individual song
piano_info <- process %>% 
  filter(title == "(No One Knows Me) Like the Piano")

# Filter song using string detection
# process %>% 
#  filter(stringr::str_detect(title, coll("Like the piano", ignore_case = TRUE)))

piano_url <- piano_info$track_url
```

Now that we have the url, feed it into `genius_url()`.

```{r}
genius_url(piano_url, info = "simple")
```

___________


# On the Internals 

## Generative functions

This package works almost entirely on pattern detection. The urls from _Genius_ are (mostly) easily reproducible (shout out to [Angela Li](https://twitter.com/CivicAngela) for pointing this out). 

The two functions that generate urls are `gen_song_url()` and `gen_album_url()`. To see how the functions work, try feeding an artist and song title to `gen_song_url()` and an artist and album title to `gen_album_url()`. 

```{r}
gen_song_url("Laura Marling", "Soothing")
```

```{r}
gen_album_url("Daniel Caesar", "Freudian")
```

`genius_lyrics()` calls `gen_song_url()` and feeds the output to `genius_url()` which preforms the scraping. 

Getting lyrics for albums is slightly more involved. It first calls `genius_tracklist()` which first calls `gen_album_url()` then using the handy package `rvest` scrapes the song titles, track numbers, and song lyric urls. Next, the song urls from the output are iterated over and fed to `genius_url()`.

To make this more clear, take a look inside of `genius_album()`

```{r}
genius_album <- function(artist = NULL, album = NULL, info = "simple") {

  # Obtain tracklist from genius_tracklist
  album <- genius_tracklist(artist, album) %>%

    # Iterate over the url to the song title
    mutate(lyrics = map(track_url, genius_url, info)) %>%

    # Unnest the tibble with lyrics
    unnest(lyrics) %>%
    
    # Deselect the track url
    select(-track_url)


  return(album)
}
```


### Notes:

As this is my first _"package"_ there will be many issues. Please submit an issue and I will do my best to attend to it. 

There are already issues of which I am present (the lack of error handling). If you would like to take those on, please go ahead and make a pull request. Please contact me on [Twitter](twitter.com/josiahparry).