Author: @tbtommyb
Maintainer: TBC
This workshop is designed to build your confidence in querying data using SQL.
We will be working with the dataset in the data.sql
file. This file contains a set of SQL commands that will create a set of tables and fill them with data. We will connect to the PostgreSQL server running locally on our individual computers and tell it to run the file.
Make sure that you have correctly installed PostgreSQL according to these instructions. Check that you can connect to your locally-running database by running psql
from the command line.
If you run into problems, on a Mac using Homebrew, run brew services restart postgresql
and try to connect again. On Ubuntu, run sudo service postgresql restart
and try to connect again.
Please download the file and navigate in the Terminal to its location.
Now run the command psql --file=data.sql
in a new Terminal window/tab.
If it doesn't work try psql -f data.sql
. If it still doesn't work then holler.
This will connect to your PostgreSQL server and run all of the SQL in data.sql
, setting up our database for us.
We mentioned above that PostgreSQL using a server-client model. Currently we're running both on the same machine but if we wanted to we could have the server running on a different computer and connect to it via the client. We will cover this soon.
Now that we're set up, we can connect to our newly-created database by running psql
(if it gives an 'access denied' error, try psql -U [your-user-name]
).
Slightly confusingly, psql
has its own set of commands that are entirely different from SQL. You can identify them because they start with a backwards slash () and don't end in a semicolon.
Once you are in psql
try some of the following commands:
\d
- list all tables (know as 'relations' in psql)
\d [table name]
- give information on a given table
\l
- list all databases
Before we jump into the challenges, here are a few syntax points to be aware of:
Don't forget to use semicolons at the end of SQL commands. If you hit enter and you just get empty lines this is probably what you're missing.
In PostgreSQL words in double quotes mean identifiers like the names of tables and columns. Single quotes are used for values. If you do something like:
SELECT * FROM authors WHERE first_name = "Sharon"
You'll get an error because PostgreSQL thinks "Sharon" is the name of a column. Use single quotes so that PostgreSQL knows it's a value. You can optionally put double quotes around authors
and first_name
, but single quotes won't work (because they are the names of identifiers within our database).
Note that a single equals is used for equality testing, not assignment.
SQL keywords like SELECT
, WHERE
etc can be in upper or lower case. The convention is upper case to distinguish them from identifiers and values but PostgreSQL will understand either way.
SQL is pretty flexible with whitespace so you can spread your statements out on to as many lines as you want. Keeping things aligned can help make big statements easier to read. Just remember to end with a semicolon!
Here are the schema diagrams to help:
Column | Type | Modifiers |
---|---|---|
id | integer | not null default |
first_name | character varying(100) | not null |
surname | character varying(100) | not null |
location | character varying(100) |
Column | Type | Modifiers |
---|---|---|
id | integer | not null default |
name | character varying(100) | not null |
release_date | date | not null |
publisher_id | integer | foreign key (publishers.id) |
Column | Type | Modifiers |
---|---|---|
id | integer | not null default |
name | character varying(100) | not null |
Column | Type | Modifiers |
---|---|---|
book_id | integer | foreign key (books.id) |
author_id | integer | foreign key (authors.id) |
Please don't feel that you have to get through all of them or be able to answer them all right away! The idea is to introduce you to the kind of queries we do regularly with SQL.
These challenges cover the basics of SQL: selects, joins and conditions.
first_name | surname |
---|---|
Sharon | Smith |
Ted | Burns |
Stephen | Wistle |
Amanda | Bertwistle |
David | Grewal |
John | White |
Paul | Hallam-Wistle |
Paul | Jones |
id | first_name | surname | location |
---|---|---|---|
4 | Amanda | Bertwistle | Nazareth |
2 | Ted | Burns | London |
5 | David | Grewal |
id | first_name | surname | location |
---|---|---|---|
1 | Sharon | Smith | Nazareth |
2 | Ted | Burns | London |
4 | Amanda | Bertwistle | Nazareth |
6 | John | White | London |
7 | Paul | Hallam-Wistle | London |
8 | Paul | Jones | Nazareth |
id | first_name | surname | location |
---|---|---|---|
2 | Ted | Burns | London |
3 | Stephen | Wistle | |
5 | David | Grewal | |
6 | John | White | London |
7 | Paul | Hallam-Wistle | London |
id | first_name | surname | location |
---|---|---|---|
3 | Stephen | Wistle | |
4 | Amanda | Bertwistle | Nazareth |
7 | Paul | Hallam-Wistle | London |
'No Starch Press'
name | name |
---|---|
No Starch Press | Python Made Easy |
No Starch Press | JavaScript: The Really Good Parts |
Note: Only one author per row, so the book's name may need to be repeated.
name | first_name | surname |
---|---|---|
Python Made Easy | Sharon | Smith |
Python Made Easy | David | Grewal |
Python Made Easy | Amanda | Bertwistle |
SQL: Part 2 | Sharon | Smith |
JavaScript: The Really Good Parts | Stephen | Wistle |
JavaScript: The Really Good Parts | David | Grewal |
Java in Japanese | Paul | Jones |
Java in Japanese | Ted | Burns |
Java in Japanese | Stephen | Wistle |
Java in Japanese | David | Grewal |
Java in Japanese | Amanda | Bertwistle |
Elm Street | David | Grewal |
Elm Street | John | White |
Elm Street | Sharon | Smith |
CSS: Cansei | Amanda | Bertwistle |
CSS: Cansei | Paul | Hallam-Wistle |
Ruby Gems | Ted | Burns |
Ruby Gems | Paul | Hallam-Wistle |
C++ | Paul | Jones |
C++ | John | White |
C++ | David | Grewal |
C++ | Sharon | Smith |
CoffeeScript in Java | Paul | Hallam-Wistle |
CoffeeScript in Java | Stephen | Wistle |
Swift in 10 Days | Stephen | Wistle |
Swift in 10 Days | David | Grewal |
'Java in Japanese' and 'Ruby Gems'
These slightly trickier challenges will require you to use aggregate functions and/or subqueries.
first_name | surname |
---|---|
Paul | Hallam-Wistle |
David | Grewal |
Sharon | Smith |
Amanda | Bertwistle |
Stephen | Wistle |
name | count |
---|---|
McGraw-Hill | 4 |
The Big Publishing House | 3 |
No Starch Press | 2 |
Mega Corp Ltd | 1 |
name | count |
---|---|
Java in Japanese | 5 |
C++ | 4 |
Elm Street | 3 |
Swift in 10 Days | 2 |
CoffeeScript in Java | 2 |
Ruby Gems | 2 |
Highest: 'Java in Japanese' (5 authors)
Lowest: 'SQL: Part 2' (1 author)
David Grewal, 6
Doing these is not required! Only look at these if you have time at the end.
-
I forgot to make a primary key for books_authors table. Alter the table to create a new column to contain a primary key made up of 'book_id' and 'author_id'.
-
What's the average number of authors per book? (Answer: 2.6)
-
Show every author who has only written for one publisher. (Answer: Ted Burns)
-
Which location has the higher figure for books per author? (Answer: Nazareth)
-
Let's say you are the first developer at a new start up called 'Amazonia'. Your boss asks you to modify the database so that customers can add books to their shopping carts. What tables and associations would you need?