In this lesson, we'll learn how to use SQLAlchemy to write queries about Microsoft's Northwind Traders database!
You will be able to:
- Read and understand an ERD diagram
- Create queries with SQLAlchemy, including queries that involve many-to-many relationships
In order to complete this lab, we'll need to download a SQLite3-compatible version of the Northwind Traders database from Microsoft. Microsoft built this database back in the year 2000 to help showcase their SQL Server technology. Since then, it has been open-sourced and has become a great practice tool for every new generation of SQL learners.
Lucky for us, some generous programmers have already converted the Northwind database to a sqlite-compatible version and posted it on Github. We've already included the file the SQL database file that we'll be working with in the repo for this folder, along with the following ERD Diagram. However, if you would like to work with the larger version of this dataset at a future time, just clone this repo and follow their instructions to access it!
The following ERD Diagram describes the Northwind Traders Database:
If the text seems a bit hard to read inside this jupyter notebook, just go into the folder for this repo and open the Northwind_ERD.png
file manually to see it full size.
The first thing we'll need to do is connect to the Northwind Traders database, which can be found in the file Northwind_small.sqlite
.
In the cell below, import the necessary tools from the sqlalchemy
library, and take the necessary steps to connect to the database.
NOTE: We won't be modifying the information in the database at all, just querying it, so there's no need to import the various things you would need to create declarative base classes and the like.
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.orm import Session, sessionmaker
engine = None
Session = None
session = None
One of the most useful things we can do when working with a new database is to inspect the tables until we have a solid idea of what we're looking at. As you work through this lab, you'll notice that there are some small discrepancies between the Table/Column names listed in the ERD and what they are actually are in the database. This may be annoying, but this is not an accident--sometimes, documentation is wrong! By learning how to inspect what tables exist in a database, as well as how which columns exist inside a table, we can save ourselves a lot of headaches by double checking.
In the cell below:
- Import
inspect
from sqlalchemy - Create an inspector object by passing in
engine
toinspect
- Use the appropriate inspector function to get the names of all tables
inspector = None
print(None)
Great! We can now see exactly what each Table is named.
Question:
Are there any discrepancies between the table names and the ERD diagrams? if so, what are they?
Write your answer below this line:
Let's inspect the column names on one of the tables as well. In the cell below, call the appropriate method to get all column names for the 'Employee'
table.
print(None)
That output is good, but its a bit messy. Let's write a function that makes it a bit more readable, and only tells us what we need to know.
The current structure of the output is a list of dictionaries. Complete the function below so that when the function is called and passed a table name, it prints out the name and type of each column in a well-formatted way.
def get_columns_info(col_name):
cols_list = None
print("Table Name: {}".format(None))
print("")
for None in None:
print("Name: {} \t Type: {}".format(None, None))
get_columns_info('Employee')
Table Name: Employee
Name: Id Type: INTEGER Name: LastName Type: VARCHAR(8000) Name: FirstName Type: VARCHAR(8000) Name: Title Type: VARCHAR(8000) Name: TitleOfCourtesy Type: VARCHAR(8000) Name: BirthDate Type: VARCHAR(8000) Name: HireDate Type: VARCHAR(8000) Name: Address Type: VARCHAR(8000) Name: City Type: VARCHAR(8000) Name: Region Type: VARCHAR(8000) Name: PostalCode Type: VARCHAR(8000) Name: Country Type: VARCHAR(8000) Name: HomePhone Type: VARCHAR(8000) Name: Extension Type: VARCHAR(8000) Name: Photo Type: BLOB Name: Notes Type: VARCHAR(8000) Name: ReportsTo Type: INTEGER Name: PhotoPath Type: VARCHAR(8000)
Sometimes, the easiest thing for us to do is to just execute a raw SQL statement. This is very easy with SQLAlchemy--we just need to establish a connection, and then use the appropriat methods to execute SQL statements!
In the cell below:
- Create a connection using the
engine
object's appropriate method and store it in the variablecon
- Use the appropriate method from
con
to execute a raw SQL statement (in the form of a string) that gets everything from the'Customer'
table with a LIMIT of 5. Store the results returned in the variablers
- Use the
fetchall()
method to display all results fromrs
.
con = None
rs = None
print(None)
[('ALFKI', 'Alfreds Futterkiste', 'Maria Anders', 'Sales Representative', 'Obere Str. 57', 'Berlin', 'Western Europe', '12209', 'Germany', '030-0074321', '030-0076545'), ('ANATR', 'Ana Trujillo Emparedados y helados', 'Ana Trujillo', 'Owner', 'Avda. de la Constitución 2222', 'México D.F.', 'Central America', '05021', 'Mexico', '(5) 555-4729', '(5) 555-3745'), ('ANTON', 'Antonio Moreno Taquería', 'Antonio Moreno', 'Owner', 'Mataderos 2312', 'México D.F.', 'Central America', '05023', 'Mexico', '(5) 555-3932', None), ('AROUT', 'Around the Horn', 'Thomas Hardy', 'Sales Representative', '120 Hanover Sq.', 'London', 'British Isles', 'WA1 1DP', 'UK', '(171) 555-7788', '(171) 555-6750'), ('BERGS', 'Berglunds snabbköp', 'Christina Berglund', 'Order Administrator', 'Berguvsvägen 8', 'Luleå', 'Northern Europe', 'S-958 22', 'Sweden', '0921-12 34 65', '0921-12 34 67')]
So far we've been able to easily connect to a SQL database, inspect the tables, and execute queries. However, the results returned from queries haven't been in an easily readable format. We'll fix that by taking the results and storing it in a pandas DataFrame!
In the cell below:
- Import pandas and set the standard alias.
- Create and execute a query that gets The firstname, lastname, and title of every person in the
'Employee'
table. - Create a pandas DataFrame out of the results returned from
rs.fetchall()
- Display the head of the new DataFRame
rs = None
df = None
df.head()
Nice! We can now read our results. However, the columns of our DataFrame aren't labeled. Luckily, pandas plays nicely with the sqlalchemy library, and can actually execute sql queries!
In the cell below:
- Use the appropriate method from pandas to select all columns for every row in the
[Order]
table where the customer is'VINET'
- Be sure to pass in our
engine
as the 2nd parameter, otherwise it won't work! - Display the head of the DataFrame created to ensure everything worked correctly.
df = None
df.head()
Great! As we can see from the output above, when we let pandas execute the SQL query for us, the DataFrame now contains columns with the correct labels. This is a great way to execute SQL while still making sure our results are easy to read and manipulate by using DataFrames!
Let's try executing a JOIN statement inside pd.read_sql_query
.
In the cell below:
- Write a query that gets the Order ID, Company Name, and the total count of orders made by each company (as num_orders).
- Group the results by Company Name
- Order the results by num_orders, descending
- Display the head of the DataFrame to ensure everything worked correctly.
df = None
df.head()
Great job! Let's see if we can execute a join that includes entities with a many-to-many relationship.
In the cell below:
- Write a query that selects the LastName, FirstName and number of territories assigned for every employee.
- Group the results by employee lastname
- Order by the total number of territories assigned to each employee, descending
- You'll need to make use a join table to solve this one--be sure to take a look at the ERD diagram again if needed!
- Store your results in a DataFrame and display the head to ensure that everything worked correctly.
NOTE: For long SQL statements, consider using the multiline string format in python, denoted by """three quotes"""
at the beginning and end. Note that if you hit enter to move to another line, be sure to add an \
character at the end of the line to escape it--otherwise, your sql statements will contain \n
newline characters wherever you hit enter to move to the next line.
q = None
df2 = pd.read_sql_query(None, None)
df2.head()
Great job! You've demonstrated proficiency using raw sql with SQLAlchemy. However, we haven't yet touched all the fun declarative stuff. Let's get some practice working with SQLAlchemy session
objects below!
So far, we've just been using SQLAlchemy as a way to connect to a database and run SQL queries. However, SQLAlchemy is an Object-Relational Mapper, and can map entities in our database to python objects! This can be incredibly helpful when we need to incorporate data from our database into an object-oriented program or model.
Let's start by getting some practice with session
objects, because that's where all the magic happens.
Recall that we created a session
object at the beginning of this lab by using SQLAlchemy's sessionmaker
function and binding it to our engine
object. We haven't used our session
object too much thus far, but now we'll use it for queries!
The session
object contains a .query()
method which returns a query object containing the results of our query, with the results mapped to objects.
Before we can make use of Object-Relational Mappings, we need to make sure that we have mappings created that map the tables in our existing database to objects in python. We don't want to have to do this manually, so we'll make use the automap
module inside of sqlalchemy.ext
.
In the cell below:
- Import
MetaData
fromsqlalchemy
- Import
automap_base
fromsqlalchemy.ext.automap
- Create a
MetaData
object - Use the metadata object's
reflect
method on ourengine
- Call
automap_base
and set themetadata
parameter to ourmetadata
. Store the results returned inside of the variableBase
- Call
base.prepare()
- Map
Employee
andCustomer
to theEmployee
andCustomer
classes, which can be found inside ofBase.classes
metadata = None
metadata.reflect(None)
Base = None
# Don't forget to .prepare()!
Employee, Customer = None, None
Now that we have some mappings set up, we can make use of session.query()
help us query our database!
Let's use the query()
object to get all the employees from the 'Employee'
table.
In the cell below:
- Create a for loop that iterates through the results returned by a
session.query()
of the Employee table (pass this as a variable, not a string). - Order the results by the Employee's
.HireDate
attribute. - Print the last name, first name, and hire date of each employee.
One great benefit of using session.query()
to query our data is that we can easily execute implicit joins by making use of the .filter()
method.
So far we've only explicitly specified mappings for the Employee
and Customer
classes. We'll need to do this now for the Product
and Category
classes before we can use them with session.query()
.
In the cell below, set the mappings for Product
and Category
.
HINT: This will look just like the last line of the code cell where we declared our mappings previously. No need to repeat all the steps for getting metadata and creating a Base
class. Just set the mappings!
Product, Category = None, None
Great!
Now, in the cell below:
- Create a for loop that iterates through all results returned from a query of Products and Categories
- Use the
.filter()
method to only include cases where the Product's.CategoryID
matches the Category's.Id
attribute. - Print out the name of each product, followed by the name of the category that it belongs to.
Great job! You've just used SQLAlchemy to work with a sample production database. Note that there are many, many more awesome things that SQLAlchemy can do, but they're outside the scope of this lesson. However, if you're interested in learning more, don't be afraid to take a look at the SQLAlchemy documentation and work through some tutorials in your spare time!