Previously, you've learned about the typical case where one joins on a primary or foreign key. In this section, you'll explore other types of joins using One-to-Many and Many-to-many relationships!
You will be able to:
- Explain why Join Tables are needed in Many-to-Many relationships
We've looked at a couple kinds of different join statements: left joins and inner joins. Both of these refer to the way in which we would like to define our joins based on the tables and their shared information. Another perspective on this is the number of matches between the tables based on our defined links with the keywords on or using.
We've investigated the typical case where one joins on a primary or foreign key. For example, when we join on customerID or employeeID, this value should be unique to that table. As such, our joins have been very similar to using a dictionary to find additional information associated with that record. In cases where there are multiple entries, in either table, for the field you are joining on, you will similarly be given multiple rows in your resulting view, one for each of these entries.
For example, let's say you have another table 'restaurants' that has many columns including name, city, and rating. If you were to join this table with the offices table using the shared city column, you might get some unexpected behavior. That is, in the office table, there is only one office per city. However, because there is apt to be more then one restaurant for each of these cities in our second table, we will get unique combinations of Offices and Restaurants from our join. If there are 513 restaurants for Boston in our restaurant table and 1 office for Boston, our joined table will have each of these 513 rows, one for each restaurant along with the one office.
If we had 2 offices for Boston, and 513 restaurants, our join would have 1026 rows for boston; 513 for each restuarant along with the first office and 513 for each restaurant with the second office. Three offices in Boston would similarly produce 1539 rows; one for each unique combination of restaurants and offices. This is where you should be particularly careful of many to many joins as the resulting set size can explode drastically potentially consuming vast amounts of memory and other resources.
import sqlite3
import pandas as pd
conn = sqlite3.connect('data.sqlite', detect_types=sqlite3.PARSE_COLNAMES)
cur = conn.cursor()
cur.execute('select * from offices;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 7
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
officeCode | city | phone | addressLine1 | addressLine2 | state | country | postalCode | territory | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA |
1 | 2 | Boston | +1 215 837 0825 | 1550 Court Place | Suite 102 | MA | USA | 02107 | NA |
2 | 3 | NYC | +1 212 555 3000 | 523 East 53rd Street | apt. 5A | NY | USA | 10022 | NA |
3 | 4 | Paris | +33 14 723 4404 | 43 Rue Jouffroy D'abbans | France | 75017 | EMEA | ||
4 | 5 | Tokyo | +81 33 224 5000 | 4-1 Kioicho | Chiyoda-Ku | Japan | 102-8578 | Japan |
cur.execute('select * from employees;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employeeNumber | lastName | firstName | extension | officeCode | reportsTo | jobTitle | ||
---|---|---|---|---|---|---|---|---|
0 | 1002 | Murphy | Diane | x5800 | dmurphy@classicmodelcars.com | 1 | President | |
1 | 1056 | Patterson | Mary | x4611 | mpatterso@classicmodelcars.com | 1 | 1002 | VP Sales |
2 | 1076 | Firrelli | Jeff | x9273 | jfirrelli@classicmodelcars.com | 1 | 1002 | VP Marketing |
3 | 1088 | Patterson | William | x4871 | wpatterson@classicmodelcars.com | 6 | 1056 | Sales Manager (APAC) |
4 | 1102 | Bondur | Gerard | x5408 | gbondur@classicmodelcars.com | 4 | 1056 | Sale Manager (EMEA) |
cur.execute('select * from offices join employees using(officeCode);')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
officeCode | city | phone | addressLine1 | addressLine2 | state | country | postalCode | territory | employeeNumber | lastName | firstName | extension | reportsTo | jobTitle | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA | 1002 | Murphy | Diane | x5800 | dmurphy@classicmodelcars.com | President | |
1 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA | 1056 | Patterson | Mary | x4611 | mpatterso@classicmodelcars.com | 1002 | VP Sales |
2 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA | 1076 | Firrelli | Jeff | x9273 | jfirrelli@classicmodelcars.com | 1002 | VP Marketing |
3 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA | 1143 | Bow | Anthony | x5428 | abow@classicmodelcars.com | 1056 | Sales Manager (NA) |
4 | 1 | San Francisco | +1 650 219 4782 | 100 Market Street | Suite 300 | CA | USA | 94080 | NA | 1165 | Jennings | Leslie | x3291 | ljennings@classicmodelcars.com | 1143 | Sales Rep |
Here we join products with product lines. There are only a few product lines that will be matched to each product. As a result, the product line descriptions will be repeated in our resulting view.
cur.execute('select * from products;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 110
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | |
---|---|---|---|---|---|---|---|---|---|
0 | S10_1678 | 1969 Harley Davidson Ultimate Chopper | Motorcycles | 1:10 | Min Lin Diecast | This replica features working kickstand, front... | 7933 | 48.81 | 95.70 |
1 | S10_1949 | 1952 Alpine Renault 1300 | Classic Cars | 1:10 | Classic Metal Creations | Turnable front wheels; steering function; deta... | 7305 | 98.58 | 214.30 |
2 | S10_2016 | 1996 Moto Guzzi 1100i | Motorcycles | 1:10 | Highway 66 Mini Classics | Official Moto Guzzi logos and insignias, saddl... | 6625 | 68.99 | 118.94 |
3 | S10_4698 | 2003 Harley-Davidson Eagle Drag Bike | Motorcycles | 1:10 | Red Start Diecast | Model features, official Harley Davidson logos... | 5582 | 91.02 | 193.66 |
4 | S10_4757 | 1972 Alfa Romeo GTA | Classic Cars | 1:10 | Motor City Art Classics | Features include: Turnable front wheels; steer... | 3252 | 85.68 | 136.00 |
cur.execute('select * from productlines;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 7
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
productLine | textDescription | htmlDescription | image | |
---|---|---|---|---|
0 | Classic Cars | Attention car enthusiasts: Make your wildest c... | ||
1 | Motorcycles | Our motorcycles are state of the art replicas ... | ||
2 | Planes | Unique, diecast airplane and helicopter replic... | ||
3 | Ships | The perfect holiday or anniversary gift for ex... | ||
4 | Trains | Model trains are a rewarding hobby for enthusi... |
cur.execute("""select * from products
join productlines
using(productLine);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 110
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
productCode | productName | productLine | productScale | productVendor | productDescription | quantityInStock | buyPrice | MSRP | textDescription | htmlDescription | image | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | S10_1678 | 1969 Harley Davidson Ultimate Chopper | Motorcycles | 1:10 | Min Lin Diecast | This replica features working kickstand, front... | 7933 | 48.81 | 95.70 | Our motorcycles are state of the art replicas ... | ||
1 | S10_1949 | 1952 Alpine Renault 1300 | Classic Cars | 1:10 | Classic Metal Creations | Turnable front wheels; steering function; deta... | 7305 | 98.58 | 214.30 | Attention car enthusiasts: Make your wildest c... | ||
2 | S10_2016 | 1996 Moto Guzzi 1100i | Motorcycles | 1:10 | Highway 66 Mini Classics | Official Moto Guzzi logos and insignias, saddl... | 6625 | 68.99 | 118.94 | Our motorcycles are state of the art replicas ... | ||
3 | S10_4698 | 2003 Harley-Davidson Eagle Drag Bike | Motorcycles | 1:10 | Red Start Diecast | Model features, official Harley Davidson logos... | 5582 | 91.02 | 193.66 | Our motorcycles are state of the art replicas ... | ||
4 | S10_4757 | 1972 Alfa Romeo GTA | Classic Cars | 1:10 | Motor City Art Classics | Features include: Turnable front wheels; steer... | 3252 | 85.68 | 136.00 | Attention car enthusiasts: Make your wildest c... |
If we join the employees and offices table, we will have a view with repeat cities listed. (Recall this was 23 rows, one for each employee. Joining this with the customer table on the cities column could cause us to have a huge number of rows, one for each employee and customer combination for a given city.) In this particular example, our results are limited as this mock database is much smaller then is apt to happen in practice. However, it is important to conceptualize the potential impact of ill conceived joins, as severe load can be put on the database causing slow execution time, and potentially even tying up database resources for other analysts who may be using the system.
cur.execute("""select * from employees
join offices
using(officeCode)
join customers
using(city);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 45
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employeeNumber | lastName | firstName | extension | officeCode | reportsTo | jobTitle | city | phone | ... | contactLastName | contactFirstName | phone | addressLine1 | addressLine2 | state | postalCode | country | salesRepEmployeeNumber | creditLimit | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1002 | Murphy | Diane | x5800 | dmurphy@classicmodelcars.com | 1 | President | San Francisco | +1 650 219 4782 | ... | Murphy | Julie | 6505555787 | 5557 North Pendale Street | CA | 94217 | USA | 1165 | 64600.00 | ||
1 | 1002 | Murphy | Diane | x5800 | dmurphy@classicmodelcars.com | 1 | President | San Francisco | +1 650 219 4782 | ... | Brown | Julie | 6505551386 | 7734 Strong St. | CA | 94217 | USA | 1165 | 105000.00 | ||
2 | 1056 | Patterson | Mary | x4611 | mpatterso@classicmodelcars.com | 1 | 1002 | VP Sales | San Francisco | +1 650 219 4782 | ... | Murphy | Julie | 6505555787 | 5557 North Pendale Street | CA | 94217 | USA | 1165 | 64600.00 | |
3 | 1056 | Patterson | Mary | x4611 | mpatterso@classicmodelcars.com | 1 | 1002 | VP Sales | San Francisco | +1 650 219 4782 | ... | Brown | Julie | 6505551386 | 7734 Strong St. | CA | 94217 | USA | 1165 | 105000.00 | |
4 | 1076 | Firrelli | Jeff | x9273 | jfirrelli@classicmodelcars.com | 1 | 1002 | VP Marketing | San Francisco | +1 650 219 4782 | ... | Murphy | Julie | 6505555787 | 5557 North Pendale Street | CA | 94217 | USA | 1165 | 64600.00 |
5 rows × 28 columns
cur.execute("""select * from employees;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employeeNumber | lastName | firstName | extension | officeCode | reportsTo | jobTitle | ||
---|---|---|---|---|---|---|---|---|
0 | 1002 | Murphy | Diane | x5800 | dmurphy@classicmodelcars.com | 1 | President | |
1 | 1056 | Patterson | Mary | x4611 | mpatterso@classicmodelcars.com | 1 | 1002 | VP Sales |
2 | 1076 | Firrelli | Jeff | x9273 | jfirrelli@classicmodelcars.com | 1 | 1002 | VP Marketing |
3 | 1088 | Patterson | William | x4871 | wpatterson@classicmodelcars.com | 6 | 1056 | Sales Manager (APAC) |
4 | 1102 | Bondur | Gerard | x5408 | gbondur@classicmodelcars.com | 4 | 1056 | Sale Manager (EMEA) |
cur.execute("""select * from customers;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 122
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
customerNumber | customerName | contactLastName | contactFirstName | phone | addressLine1 | addressLine2 | city | state | postalCode | country | salesRepEmployeeNumber | creditLimit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 103 | Atelier graphique | Schmitt | Carine | 40.32.2555 | 54, rue Royale | Nantes | 44000 | France | 1370 | 21000.00 | ||
1 | 112 | Signal Gift Stores | King | Jean | 7025551838 | 8489 Strong St. | Las Vegas | NV | 83030 | USA | 1166 | 71800.00 | |
2 | 114 | Australian Collectors, Co. | Ferguson | Peter | 03 9520 4555 | 636 St Kilda Road | Level 3 | Melbourne | Victoria | 3004 | Australia | 1611 | 117300.00 |
3 | 119 | La Rochelle Gifts | Labrune | Janine | 40.67.8555 | 67, rue des Cinquante Otages | Nantes | 44000 | France | 1370 | 118200.00 | ||
4 | 121 | Baane Mini Imports | Bergulfsen | Jonas | 07-98 9555 | Erling Skakkes gate 78 | Stavern | 4110 | Norway | 1504 | 81700.00 |
In this section, you expanded your Join knowledge to One-to-Many and Many-to-many Joins!