In this section, you will learn about several types of Join statements.
You will be able to:
- Compare and contrast the various types of joins
- Understand the structure of Join statements, and the role of foreign and primary keys in them
In almost all cases, rather then just working with a single table we will typically need data from multiple tables. Doing this requires the use of **joins ** using shared columns from the two tables. For example, here's a diagram of a mock customer relation management database.
import sqlite3
import pandas as pd
conn = sqlite3.connect('data.sqlite', detect_types=sqlite3.PARSE_COLNAMES)
cur = conn.cursor()
Let's say we need to generate some report that includes details about products from orders. To do that, we would need to take data from multiple tables in a single statement.
cur.execute("""select * from orderdetails
join products
on orderdetails.productCode = products.productCode
limit 10;
""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
orderNumber | productCode | quantityOrdered | priceEach | orderLineNumber | productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10100 | S18_1749 | 30 | 136.00 | 3 | S18_1749 | 1917 Grand Touring Sedan | Vintage Cars | 1:18 | Welly Diecast Productions | This 1:18 scale replica of the 1917 Grand Tour... | 2724 | 86.70 | 170.00 |
1 | 10100 | S18_2248 | 50 | 55.09 | 2 | S18_2248 | 1911 Ford Town Car | Vintage Cars | 1:18 | Motor City Art Classics | Features opening hood, opening doors, opening ... | 540 | 33.30 | 60.54 |
2 | 10100 | S18_4409 | 22 | 75.46 | 4 | S18_4409 | 1932 Alfa Romeo 8C2300 Spider Sport | Vintage Cars | 1:18 | Exoto Designs | This 1:18 scale precision die cast replica fea... | 6553 | 43.26 | 92.03 |
3 | 10100 | S24_3969 | 49 | 35.29 | 1 | S24_3969 | 1936 Mercedes Benz 500k Roadster | Vintage Cars | 1:24 | Red Start Diecast | This model features grille-mounted chrome horn... | 2081 | 21.75 | 41.03 |
4 | 10101 | S18_2325 | 25 | 108.06 | 4 | S18_2325 | 1932 Model A Ford J-Coupe | Vintage Cars | 1:18 | Autoart Studio Design | This model features grille-mounted chrome horn... | 9354 | 58.48 | 127.13 |
cur.execute("""select * from orderdetails limit 10;""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
orderNumber | productCode | quantityOrdered | priceEach | orderLineNumber | |
---|---|---|---|---|---|
0 | 10100 | S18_1749 | 30 | 136.00 | 3 |
1 | 10100 | S18_2248 | 50 | 55.09 | 2 |
2 | 10100 | S18_4409 | 22 | 75.46 | 4 |
3 | 10100 | S24_3969 | 49 | 35.29 | 1 |
4 | 10101 | S18_2325 | 25 | 108.06 | 4 |
cur.execute("""select * from products limit 10;""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | |
---|---|---|---|---|---|---|---|---|---|
0 | S10_1678 | 1969 Harley Davidson Ultimate Chopper | Motorcycles | 1:10 | Min Lin Diecast | This replica features working kickstand, front... | 7933 | 48.81 | 95.70 |
1 | S10_1949 | 1952 Alpine Renault 1300 | Classic Cars | 1:10 | Classic Metal Creations | Turnable front wheels; steering function; deta... | 7305 | 98.58 | 214.30 |
2 | S10_2016 | 1996 Moto Guzzi 1100i | Motorcycles | 1:10 | Highway 66 Mini Classics | Official Moto Guzzi logos and insignias, saddl... | 6625 | 68.99 | 118.94 |
3 | S10_4698 | 2003 Harley-Davidson Eagle Drag Bike | Motorcycles | 1:10 | Red Start Diecast | Model features, official Harley Davidson logos... | 5582 | 91.02 | 193.66 |
4 | S10_4757 | 1972 Alfa Romeo GTA | Classic Cars | 1:10 | Motor City Art Classics | Features include: Turnable front wheels; steer... | 3252 | 85.68 | 136.00 |
A more concise way to join the tables if the column name is identical is the using clauase.
cur.execute("""select * from orderdetails
join products
using(productCode)
limit 10;
""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
orderNumber | productCode | quantityOrdered | priceEach | orderLineNumber | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10100 | S18_1749 | 30 | 136.00 | 3 | 1917 Grand Touring Sedan | Vintage Cars | 1:18 | Welly Diecast Productions | This 1:18 scale replica of the 1917 Grand Tour... | 2724 | 86.70 | 170.00 |
1 | 10100 | S18_2248 | 50 | 55.09 | 2 | 1911 Ford Town Car | Vintage Cars | 1:18 | Motor City Art Classics | Features opening hood, opening doors, opening ... | 540 | 33.30 | 60.54 |
2 | 10100 | S18_4409 | 22 | 75.46 | 4 | 1932 Alfa Romeo 8C2300 Spider Sport | Vintage Cars | 1:18 | Exoto Designs | This 1:18 scale precision die cast replica fea... | 6553 | 43.26 | 92.03 |
3 | 10100 | S24_3969 | 49 | 35.29 | 1 | 1936 Mercedes Benz 500k Roadster | Vintage Cars | 1:24 | Red Start Diecast | This model features grille-mounted chrome horn... | 2081 | 21.75 | 41.03 |
4 | 10101 | S18_2325 | 25 | 108.06 | 4 | 1932 Model A Ford J-Coupe | Vintage Cars | 1:18 | Autoart Studio Design | This model features grille-mounted chrome horn... | 9354 | 58.48 | 127.13 |
Alternatively, you can also alias tables by giving them an alternative shorthand name directly after them. Here we use the aliases 'o' and 'p' for orderdetails and products respectively.
cur.execute("""select * from orderdetails o
join products p
on o.productCode = p.productCode
limit 10;
""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
orderNumber | productCode | quantityOrdered | priceEach | orderLineNumber | productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10100 | S18_1749 | 30 | 136.00 | 3 | S18_1749 | 1917 Grand Touring Sedan | Vintage Cars | 1:18 | Welly Diecast Productions | This 1:18 scale replica of the 1917 Grand Tour... | 2724 | 86.70 | 170.00 |
1 | 10100 | S18_2248 | 50 | 55.09 | 2 | S18_2248 | 1911 Ford Town Car | Vintage Cars | 1:18 | Motor City Art Classics | Features opening hood, opening doors, opening ... | 540 | 33.30 | 60.54 |
2 | 10100 | S18_4409 | 22 | 75.46 | 4 | S18_4409 | 1932 Alfa Romeo 8C2300 Spider Sport | Vintage Cars | 1:18 | Exoto Designs | This 1:18 scale precision die cast replica fea... | 6553 | 43.26 | 92.03 |
3 | 10100 | S24_3969 | 49 | 35.29 | 1 | S24_3969 | 1936 Mercedes Benz 500k Roadster | Vintage Cars | 1:24 | Red Start Diecast | This model features grille-mounted chrome horn... | 2081 | 21.75 | 41.03 |
4 | 10101 | S18_2325 | 25 | 108.06 | 4 | S18_2325 | 1932 Model A Ford J-Coupe | Vintage Cars | 1:18 | Autoart Studio Design | This model features grille-mounted chrome horn... | 9354 | 58.48 | 127.13 |
Above, we have only been doing inner joins which is the intersection of the two tables. There are many other types of joins, displayed below. Of these, sqlite does not support outer joins, but it is good to be aware of as more powerful versions of sql such as postgresql support these additional functions.
For example, the statement
select * from products left join orderdetails;
would return all products, even those that hadn't been ordered. We can imagine that all products in inventory should have a description in the product table, but perhaps not every product is represented in the orderdetails table.
cur.execute("""select * from products
left join orderdetails
using(productCode);
""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
print(len(df))
print(len(df[df.orderNumber.isnull()]))
df[df.orderNumber.isnull()].head()
2997
1
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | orderNumber | quantityOrdered | priceEach | orderLineNumber | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1122 | S18_3233 | 1985 Toyota Supra | Classic Cars | 1:18 | Highway 66 Mini Classics | This model features soft rubber tires, working... | 7733 | 57.01 | 107.57 | None | None | None | None |
As you can see, its rare, but there is one product that has yet to be ordered
Another important consideration when performing joins is to think more about the key or column you are joining on. As we'll see in upcoming lessons, this can lead to interesting behavior if the join value is not unique in one or both of the tables. In all of the above examples, we joined two tables using the primary key. The primary key(s) of a table are those column(s) which uniquely identify a row. You'll also see this designated in our schema diagram with the asterisk (*).
We can also join tables using foreign keys which are not the primary key for that particular table, but rather another table. For example, employeeNumber is the primary key for the employees table and corresponds to the salesRepEmployeeNumber of the customers table. In the customers table, salesRepEmployeeNumber is only a foreign key, and is unlikely to be a unique identifier, as it is likely that an employee serves multiple customers. As such, in the resulting view, employeeNumber would no longer be a unique field.
cur.execute("""select * from customers c
join employees e
on c.salesRepEmployeeNumber = e.employeeNumber
order by employeeNumber;
""")
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
customerNumber | customerName | contactLastName | contactFirstName | phone | addressLine1 | addressLine2 | city | state | postalCode | ... | salesRepEmployeeNumber | creditLimit | employeeNumber | lastName | firstName | extension | officeCode | reportsTo | jobTitle | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 124 | Mini Gifts Distributors Ltd. | Nelson | Susan | 4155551450 | 5677 Strong St. | San Rafael | CA | 97562 | ... | 1165 | 210500.00 | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1 | 1143 | Sales Rep | |
1 | 129 | Mini Wheels Co. | Murphy | Julie | 6505555787 | 5557 North Pendale Street | San Francisco | CA | 94217 | ... | 1165 | 64600.00 | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1 | 1143 | Sales Rep | |
2 | 161 | Technics Stores Inc. | Hashimoto | Juri | 6505556809 | 9408 Furth Circle | Burlingame | CA | 94217 | ... | 1165 | 84600.00 | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1 | 1143 | Sales Rep | |
3 | 321 | Corporate Gift Ideas Co. | Brown | Julie | 6505551386 | 7734 Strong St. | San Francisco | CA | 94217 | ... | 1165 | 105000.00 | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1 | 1143 | Sales Rep | |
4 | 450 | The Sharp Gifts Warehouse | Frick | Sue | 4085553659 | 3086 Ingle Ln. | San Jose | CA | 94217 | ... | 1165 | 77600.00 | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1 | 1143 | Sales Rep |
5 rows × 21 columns
Notice that this also returned both columns: salesRepEmployeeNumber and employeeNumber.
In this lesson we investigated joins including the on and using clause, aliasing table names, left joins and primary and foreign keys.