/sql-references-join

An Introduction to PostgreSQL Foreign Keys

OtherNOASSERTION

General Assembly Logo

An Introduction to PostgreSQL Foreign Keys

Instructions

Fork and clone this repository.

Objectives

By the end of this lesson, students should be able to:

  • Add a foreign key reference to an existing table.
  • Create a table with a foreign key reference.
  • Insert a row which includes a reference to the id of a row in another table.
  • Update a row setting a reference to the id of a row in another table.
  • Retrieve rows from two tables using a JOIN condition

Prerequisites

Putting the 'Relation' in 'Relational Database'

In conversations about Postgres, SQL and the like, you may hear the term relational database thrown around. In the previous material on SQL, you learned how to create, modify, and destroy rows and tables. How do 'relationships' (whatever that means) fit into that context?

Suppose that we had two separate tables of information in our database, developers and lunches (see below). Each developer brings their own lunch, and none of them want to eat each others' lunches, so we have to make sure that each lunch lines up with the right developer. How might we do that?

developers

id given_name surname favorite language
1 Antony Donovan C
2 Jason Weeks JavaScript
3 Jeff Horn Ruby
4 Matt Brendzel LOLCODE

lunches

id main_course side_dish
1 salmon and tuna sushi rolls chili
2 cheese sandwich on gluten-free bread salad
3 roast beef sandwich chips
4 chicken sandwich steamed vegetables

What if we were to put nametags on each of the lunches, so that we could know which developer brought which lunch?

lunches

id developer main_course side_dish
1 Jeff salmon and tuna sushi rolls chili
2 Antony cheese sandwich on gluten-free bread salad
3 Matt roast beef sandwich chips
4 Jason chicken sandwich steamed vegetables

We've now associated (i.e. related) each lunch record with a developer record. But what if another developer with a duplicate name joins the mix? It might be better to use something unique, like the id column, instead.

id developer_id main_course side_dish
1 3 salmon and tuna sushi rolls chili
2 1 cheese sandwich on gluten-free bread salad
3 4 roast beef sandwich chips
4 2 chicken sandwich steamed vegetables

The developer_id column refers to data in the developers table, but it's actually a column in the lunches table. This is what's known as a foreign key.

In terms of actual implementation in an RDBMS ( relational database management system ), a column can be defined as holding foreign keys using a modifier on a table definition called a constraint. Some examples of constraints are below.

  • NOT NULL
  • UNIQUE
  • PRIMARY KEY
  • FOREIGN KEY
  • CHECK

Each of these constraints allows you to put some bounds on the values that can be put into specific columns. The FOREIGN KEY constraint in particular makes sure that values in that column are always valid id values in the table that the column refers to.

When adding a FOREIGN KEY constraint to a column, an INDEX constraint is also usually added to that same column in order to speed access to matched rows. The combination of FOREIGN KEY and INDEX tells the RDBMS how you intend to use the tables you've related.

In the 'developers and lunches' example, one lunch was associated with one developer. This is called a 1 to 1 relationship. However, there are several other possible arangements, including 1 to (0 or 1), 1 to (0+), and (0+) to (0+). The last two are frequently called 'one-to-many' and 'many-to-many'; we'll look at the first of these now, and the second in later materials.

Setup

Code Along : Create a Database

Let's create a working database for the next few exercises using the CREATE DATABASE command; we'll name it sql-join, and base it on the sql-crud database from the previous SQL material.

bash

psql

psql

psql (9.4.5)
Type "help" for help.

wdi=> CREATE DATABASE "sq-join" TEMPLATE "sql-crud";
CREATE DATABASE
wdi=> \c sql-join
You are now connected to database "sql-join" as user "wdi".
sql-join=>

Create a Foreign Key

Demo : Create a Foreign Key

Take a look at how a foreign key can be added to an existing table: people. Specifically, we'll add a reference to the city in which each person was born.

Adding a new foreign key column is just like adding any other new column -- it's an ALTER TABLE operation.

Conventionally, a foreign key is named for the singular of the name of the table being referenced, with the column being referenced appended after an underscore. So, if we're adding a reference to the cities table and its id column we'll create a column called city_id. However, this convention should not be followed when there is a semantically superior name available. In this case, born_in_id is a more appropriate name than city_id for the column.

Code Along : Create a Foreign Key

Let's try creating some more foreign keys. First, let's create an addresses table which references the cities table.

In the data directory, there's a CSV file called addresses.csv with a lot of address data in it. The columns are 'no' (i.e. number) and 'name', so let's represent these columns as INTEGER and CHARACTER VARYING / VARCHAR, respectively. As mentioned, we also want each address to reference the city that it's in, so let's add another column to do that.

CREATE TABLE addresses(
  id SERIAL PRIMARY KEY,
  no INTEGER,
  name VARCHAR, -- CHARACTER VARYING
  city_id INTEGER REFERENCES cities -- defaults to (id)
);

Let's do another; this time we'll add a reference from the people table to the addresses table.

ALTER TABLE people
  ADD COLUMN address_id INTEGER REFERENCES addresses
;

Lab : Create a Foreign Key

Write SQL code inside alter_table/pets.sql that adds an owner reference to the pets table.


Relate Rows in Different Tables

Demo : Relate Rows in Different Tables

Now that we've created some foreign key columns, it's possible to insert new rows into those tables that reference other tables, or even to update existing rows and add new references that way. We could easily do this with the born_in_id column we just created.

Note that a foreign key constraint will disallow invalid values in the referencing column.

Code Along : Relate Rows in Different Tables

We'll start by inserting a new address into the addresses table that's associated with Somerville.

INSERT INTO addresses(no,name, city_id)
  VALUES (255, 'Elm Street', 1)
  -- In `cities`, Somerville has an ID of 1
;

Next, let's load up all the addresses from addresses.csv; once those are loaded, we can then update the people table by associating people with some of those new addresses.

UPDATE people AS p        -- alias `people` as p
  SET address_id = a.id
    FROM addresses AS a   -- alias `addresses` as a
      WHERE a.id = p.id   -- arbitarily associate person 1 with address 1
;

Lab : Relate Rows in Different Tables

Define owner references for some existing pets. Then, pick at least two people to be folks with too many pets -- these people should have large numbers of pets associated with them. Finally, for each one of these 'pet hoarders', add a new pet and associate it with a hoarder.


Read Data Across Related Tables

Demo : Read Data Across Related Tables

Now that our rows are related, it would be nice if we could read across multiple tables at once - for instance, to see how many people lived in some particular city.

One possible way to accomplish this is to add a special clause, called a JOIN clause, to a SELECT command; this allows queries to return associated data from two tables as a single row.

SELECT c.name, COUNT(*)
  FROM people p
  INNER JOIN cities c ON p.born_in_id = c.id
    GROUP BY c.name
    HAVING COUNT(*) > 1
  -- list cities by how many people were born there
  -- and only show cities where more than one person was born
;

Depending on the type of JOIN we use, we can grab different sets of rows from one table or the other. An 'inner join' of two tables, A and B, grabs only those rows in table A that have a matching row in table B, and vice versa. Thus, if we're using an INNER JOIN, we can even reverse the order of the tables without changing anything.

SELECT c.name, COUNT(*)
  FROM cities c
  INNER JOIN people p ON p.born_in_id = c.id
    GROUP BY c.name
    HAVING COUNT(*) > 1
  -- list cities by how many people were born there
  -- and only show cities where more than one person was born
;

We could also look at people born in a particular city.

SELECT p.given_name, p.surname
  FROM people p
  INNER JOIN cities c ON p.born_in_id = c.id
    WHERE c.name = 'Somerville'
;

Code Along : Read Data Across Related Tables

Let's write some queries that focus on people, addresses, and cities.

We'll run the script in update/addresses.sql to arbitrarily associates addresses with cities.

To get a list of all people, along with their address and city, we could write

SELECT p.surname, p.given_name, a.name AS street, c.name AS city, c.country
  FROM people p
  INNER JOIN addresses a  ON p.address_id = a.id
  INNER JOIN cities c     ON a.city_id = c.id
;

To limit the results to only those people living in Germany, we could write:

SELECT p.surname, p.given_name, a.name AS street, c.name AS city, c.country
  FROM people p
  INNER JOIN addresses a  ON p.address_id = a.id
  INNER JOIN cities c     ON a.city_id = c.id
    WHERE c.country = 'DE'
;

Lab : Read Data Across Related Tables

Run this SQL code to arbitrarily assign pets to people.

UPDATE pets
  SET owner_id = 7
    WHERE kind = 'fish' AND name LIKE 'S%'
;

UPDATE pets
  SET owner_id = 11
    WHERE kind = 'bird'
;

UPDATE pets
  SET owner_id = 21
   WHERE kind = 'cat'
;

UPDATE pets
  SET owner_id = 42
    WHERE kind = 'dog'
;

Now use SELECT to come up with a list of all of the hoarders - that's anyone with more than 3 dogs, 4 birds, 5 cats, or 20 fish.

HINT: Look up UNION and see what it does.


References

  • Constraints - An overview of the variety of constraints that PostgreSQL provides.
  • CREATE TABLE - detailed documentation of PostgreSQL's version of the SQL CREATE TABLE command.
  • ALTER TABLE - detailed documentation of PostgreSQL's version of the SQL ALTER TABLE command.
  • Index Introduction - The introductory section of the chapter on indexes in PostgreSQL.
  • CREATE INDEX - detailed documentation of PostgreSQL's version of the SQL CREATE INDEX command.
  • UPDATE - detailed documentation of PostgreSQL's version of the SQL UPDATE command.
  • INSERT - detailed documentation of PostgreSQL's version of the SQL INSERT INTO command.
  • Joins Between Tables - An introduction to querying multiple tables
  • SELECT - detailed documentation of PostgreSQL's version of the SQL SELECT command.