In the previous material on SQL we used the phrase "relational database" but didn't delve into what that means.
We'll begin to cover the topic of "relationships" in database terms now.
By the end of this, developers should be able to:
- Add a foreign key reference to an existing table.
- Update a row setting a reference to the id of a row in another table.
- Insert a row which includes a reference to the id of a row in another table.
- Retrieve rows from two tables using a
JOIN
condition
- Fork and clone this repository.
- Create a new branch,
training
, for your work. - Install dependencies if necessary.
Previously we created tables to hold books for a library, patients for a clinic, and ingredients for a cookbook.
- In our library, authors are stored as attributes of books. Does that make sense?
- Who does our clinic need to care for the patients?
- And what does our cookbook need that includes ingredients?
Let's model (diagram) these new objects (things) and their relationship to our existing objects (things).
We'll create SQL scripts in scripts/library
to add an authors
table and
populate it from data in the books
table.
We'll create scripts in scripts/clinic
to add a doctors
table and populate
it from data/doctors.csv
.
We'll create scripts in scripts/cookbook
to add a recipes
table and populate
using INSERT
statements.
Conventionally, a foreign key reference is named for the singular of the name of
the table being referenced, with the column being referenced appended after an
underscore. So if we're adding a reference to the cities
table and its id
column we'll create a column called city_id
. However, this convention should
not be followed when there is a semantically superior name available.
Can you give an example of when this convention isn't appropriate?
We'll create scripts in scripts/library
to
- on the
books
table, add a reference to theid
column of theauthors
table. - populate the author reference in the
books
table. - remove the
author
column from the books table.
Note that references may constrain both the referenced and referencing table.
We'll create scripts in scripts/clinic
to
- on the
patients
table, add a reference to theid
column of thedoctors
table. - populate the doctor reference in the
patients
table for a few patients.
We'll create scripts in scripts/cookbook
to
- on the ingredients table, add a reference to the id column of the recipes table.
- populate the recipe reference in the
ingredients
table for a few ingredients.
Does this relationship seem adequate for handling actual recipes? Why or why not?
We'll create scripts in scripts/library
to retrieve books by a set of authors
and authors of certain books.
What happens if we try to DELETE
an author?
We'll create scripts in scripts/clinic
to retrieve patients of certain doctors
and doctors for some patients.
We'll create scripts in scripts/cookbook
to retrieve ingredients for a recipe.
Later we'll see how to connect multiple recipes to the same ingredients.
Imagine we have filling cabinets and paper spreadsheets instead of a digital database. You're tasked with assembling information from two paper spreadsheets. You have a sheet of authors with ids and a sheet of books with author_ids. You must answer the question: "Which books were written by either Ernest Hemingway or Shirley Jackson?".
You first take the sheet with authors, find the each author and note their ids on a third sheet. Then, you look at the books sheet and scan for Hemingway's id to appear as author id. Each time you come across a matching row, you write all the data from the books sheet, along with Hemingway's name, on your third sheet. Then you repeat the process again for Jackson.
When you're done, you have a third sheet that's an artifact of the process. It can be recreated at any time by following the steps you just went through. You take your two original sheets and put them back in the filing cabinet for the next person to use.
The process of collating the information from each of the two sheets to the third is called a JOIN, and that's why our join clause matches the id from one table with the foreign key from another.
The resulting third sheet can be called a report, and it represents a particular presentation of data from two different tables. When we run queries, we generate reports. The viewing of data and storing of data in SQL are different. The report is not a table, and the table is not a report.
- Constraints - An overview of the variety of constraints that PostgreSQL provides.
- CREATE TABLE -
detailed documentation of PostgreSQL's version of
the SQL
CREATE TABLE
command. - ALTER TABLE -
detailed documentation of PostgreSQL's version of the
SQL
ALTER TABLE
command. - Index Introduction - The introductory section of the chapter on indexes in PostgreSQL.
- CREATE INDEX -
detailed documentation of PostgreSQL's version of the
SQL
CREATE INDEX
command. - UPDATE -
detailed documentation of PostgreSQL's version of the SQL
UPDATE
command. - INSERT -
detailed documentation of PostgreSQL's version of the
SQL
INSERT INTO
command. - Joins Between Tables - An introduction to querying multiple tables
- SELECT -
detailed documentation of PostgreSQL's version of the SQL
SELECT
command.
- All content is licensed under a CCBYNCSA 4.0 license.
- All software code is licensed under GNU GPLv3. For commercial use or alternative licensing, please contact legal@ga.co.