SQL and iTunes API

This morning we will be querying data from the Chinook Database and collecting new information with the iTunes API!

# SQL Connection and Querying
import sqlite3

# Data manipulation
import pandas as pd

# API Connection
import requests

#for tests
from test_background import test_obj_dict, run_test_dict, pkl_dump, run_test

The database in stores within the data folder of this repo as chinook.db

In the cell below:

Open up a connection to chinook.db and store that connection as conn

# Your code here
conn = None

Let's create a function called sql_df that returns a dataframe of a SQL query.

In the cell below:

Define a function called sql_df that takes in two parameters query, and connection.
Return a dataframe from the function using pd.read_sql

# Your code here

Above is the schema for the Chinook Database.

For this warmup, we will be focusing on the tracks, albums, and artists tables.

We'll start with something simple for our first SQL query.

In the cell below:

Write a SQL query to collect all columns from the tracks table
LIMIT the query to only return two records from the table.
Run the query through the sql_df function and save the results as the variable first_query.

#Your code here
QUERY = None
first_query = sql_df(None)

Run the cell below to see if your query returned the correct data!

run_test(first_query, 'first_query')

Ok ok

Let's do a more complex query.

In the cell below:

Write a sql query that selects
- The Name column from the tracks table
  - Alias this column as Song
- The Title column from the albums table
  - Alias this column as Album
- The Name column from the artists table
  - Alias this column as Artist
- Use the WHERE command to only return results where Artist = 'U2'
- LIMIT the results to 15 observations.

Hint: This will require you to first join the tracks and albums tables, and then join the artists table.

We'll save the results of this query to the variable df

# Your code here

QUERY = None

df = sql_df(QUERY, conn)
df.head()

Run the cell below to see if you returned the correct results!

run_test(df, 'query_to_df')

Our df variable is made up of a table with three columns

Song	Album	Artist
Name of song	Name of album	Name of artist

Let's collect the release date of each song using the iTunes Search API and add it as a column to our dataframe

To do this, we will use three paramaters. term, entity, and limit.

term

The iTunes API interprets this paramater the same way it iterprets a search term you would manually type when searching for songs on iTunes

entity

This parameter is a filter for our search. We can filter our search to only return albums, music videos, etc. For this warmup we will be filtering our search to only return songs

limit

The limit parameter determines how many search results are returned. The minimum is 50, the maximum is 200. For this warmup, we will set this to 200

The iTunes API only allows 20 requests per minute.

We could search each song individually with

req_string = 'https://itunes.apple.com/search?term={}&entity=song&limit=200'.format(song)
requests.get(req_string).json()

But what if our code malfunctions? We risk sending too many requests, at which point the iTunes API will refuse our requests! We don't want that.

Fortunately for us, we can collect the data we need with a single request.

Instead of using the name of the song as our search term, we will use the name of the artist and keep our entity parameter as song so the results return individual songs created by the artist.

In the cell below:

Replace None with the variable needed to search the iTunes API for songs by the artist U2

req_string = 'https://itunes.apple.com/search?term={}&entity=song&limit=200'.format(None)

Run the cell below to see if your req_string variable is correct!

run_test(req_string, 'req_string')

Now that we have our req_string, we can send our request to the API using the requests library.

req = requests.get(req_string).json()

The data returned from the API is formatted as a json which for most intents and purposes is just a dictionary.

The information we want from this json is found with the 'results' key.

api_data = req['results']
type(api_data)

Ok ok! Now that we have our data from the api, Here's what we need to do:

Create an empty list called dates.
Loop over the rows of our dataframe
Collect the song name and artist name from the dataframe row.
Loop over each result in api_data.
Check if:
- The song in our dataframe row is equal to result['trackName']
- The artist in our dataframe row is equal to result['artistName']
If the song and artist match:
- Set the variable release_date to result['releaseDate']
- Append release_date to the dates list
- break out of the nested for loop.
If the song and artist do not match:
- Set release_date to None
If the nested for loop completes and release_date still equals None:
- Append None to the dates list.
Add a release_date column to df using the dates list.

dates = []
# Your code here

Let's take a look at the results

# Run this cell as is
df

Run the cell below to check if you collected the correct release dates!

run_test(df, 'release_date')

It looks like there are several None's in our release_date column. And there is a reason for this.

The word the is capitalized in our Chinook database but is lowercased by the iTunes API.

Run the cell below to see why this is a problem.

'Even Better Than The Real Thing' == 'Even Better Than the Real Thing'

Yikes! So how do we solve this?

We can solve this by lowering all strings when we compare them so no matter what, the strings are cased exactly the same.

'Even Better Than The Real Thing'.lower() == 'Even Better Than the Real Thing'.lower()

Copy and paste your code from above in the cell below, except this time lower all string values.

# Your code here

# Run this cell as is
df

Run the cell below to see if you were successful!

run_test(df, 'string_manip')

lecritch/warmup-sql-api

SQL and iTunes API