- 04/26/22 & 04/28/22
- 04/26/22
- For this CodeAlong, we will be working with the Yelp API.
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
-
Part 1:
-
Yelp API:
- Getting Started:
-
YelpAPI
python package- "YelpAPI": https://github.com/gfairchild/yelpapi
-
-
Part 2:
- Plotly Express: https://plotly.com/python/getting-started/
- With Mapbox API: https://www.mapbox.com/
px.scatter_mapbox
Documentation:
- Plotly Express: https://plotly.com/python/getting-started/
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook
Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search
# Load API Credentials
# Instantiate YelpAPI Variable
# set our API call parameters and filename before the first call
## Specify fodler for saving data
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = None
## Check if JSON_FILE exists
## If it does not exist:
## CREATE ANY NEEDED FOLDERS
# Get the Folder Name only
## If JSON_FILE included a folder:
# create the folder
## INFORM USER AND SAVE EMPTY LIST
## save the first page of results
## If it exists, inform user
## Load previous results and use len of results for offset
## set offset based on previous results
- We will use this first result to check:
- how many total results there are?
- Where is the actual data we want to save?
- how many results do we get at a time?
# use our yelp_api variable's search_query method to perform our API call
## How many results total?
- Where is the actual data we want to save?
## How many did we get the details for?
results_per_page = None
results_per_page
- Calculate how many pages of results needed to cover the total_results
# Use math.ceil to round up for the total number of pages of results.
for i in tqdm_notebook( range(1,n_pages+1)):
## The block of code we want to TRY to run
## Read in results in progress file and check the length
## save number of results for to use as offset
## use n_results as the OFFSET
## append new results and save to file
## What to do if we get an error/exception.
df = None
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file
## Save it as a compressed csv (to save space)
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))
print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')
print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')
- 04/28/22
- For this CodeAlong, we will be working with the Yelp API results from last class.
- You will load in the .csv.gz of your yelp results and prepare the data for visualization.
- You will use Plotly Express to create an interactive map with all of the results.
-
Part 1:
-
Yelp API:
- Getting Started:
-
YelpAPI
python package- "YelpAPI": https://github.com/gfairchild/yelpapi
-
-
Part 2:
- Plotly Express: https://plotly.com/python/getting-started/
- With Mapbox API: https://www.mapbox.com/
px.scatter_mapbox
Documentation:
- Plotly Express: https://plotly.com/python/getting-started/
- We want to create a map with every restaurant plotted as a scatter plot with detailed information that appears when we hover over a business
- We will use plotly express's
px.scatter_mapbox
function to accomplish this.-
We will need a Mapbox API token for some of the options:
## Plotly is not included in your dojo-env
!pip install plotly
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
## importing plotly
import plotly.express as px
## Load in csv.gz
-
- We need to get the latitude and longitude for each business as separate columns.
- We also want to be able to show the restaurants:
- name,
- price range
- address
- and if they do delivery or takeout.
## use .apply pd.Series to convert a dict to columns
df['coordinates'].apply(pd.Series)
- Why didn't that work???
## slice out a single test coordinate
test_coord = None
test_coord
- Its not a dictionary anymore!!! WTF??
- CSV files cant store iterables (lists, dictionaries) so they get converted to strings.
-
The json module has another version of load and dump called
json.loads
andjson.dumps
- These are designed to process STRINGS instead of files.
-
If we use
json.loads
we can convert our string dictionary into an actual dictionary.
## Use json.loads on the test coordinate
- JSON requires double quotes!
## replace single ' with "
## Use json.loads on the test coordinate, again
## replace ' with " (entire column)
## apply json.loads
## slice out a single test coordinate
## use .apply pd.Series to convert a dict to columns
df['coordinates'].apply(pd.Series)
## Concatenate the 2 new columns and drop the original.
Mapbox API: https://www.mapbox.com/
## Load in mapbox api credentials from .secret
- Use the plotly express
set_maptbox_acccess_token
function
## set mapbox token
## use scatter_mapbox for M.V.P map
-
We want to show the restaurants:
- name
- price range
- address
- and if they do delivery or takeout.
-
We can use the
hover_name
andhover_data
arguments forpx.scatter_mapbox
to add this info!
## add hover_name (name) and hover_data for price,rating,location
## slice out a test address
Also a string-dictionary...
## replace ' with "
df['location'] = df['location'].str.replace("'", '"')
df
## apply json.loads
Ruh roh....
- Hmm, let's slice out a test_address again and let's write a function to accomplish this instead.
- We can use try and except in our function to get around the errors.
## slice out test address
test_addr = df.loc[0, 'location']
test_addr
## write a function to just run json.loads on the address
## test applying our function
- It worked! Now let's save this as a new column (display_location), and then let's investigate the businesses that had an "ERROR".
### save a new display_location column using our function
## filter for businesses with display_location == "ERROR"
## slice out a new test address and inspect
test_addr = df.loc[437, 'location']
test_addr
After some more investigation, we would find a few issues with these "ERROR" rows.
- They contained None.
- They contained an apostrophe in the name.
- ...?
- Use Regular Expressions to find an fix the display addresses with "'" in them
- Use string split to split on the word display address.
- Then use string methods to clean up
## remove any rows where display_location == 'ERROR'
- We want the "display_address" key from the "display_location" dictionaries.
- We could use a .apply and a lamda to slice out the desired key.
## use apply and lambda to slice correct key
- Almost done! We want to convert display_address to a string instead a list of strings.
- We can use the string method .join to do so!
## slice out a test_address
## test using .join with a "\n"
## apply the join to every row with a lambda
## make ourn final map and save as varaible
## remake the final address column with <br> instead
## plot the final map
## use fig.write_html to save map