<<<<<<< HEAD
Welcome! The course was created for the University of Toronto's Data Sciences Institute. It has been designed for individuals with no prior expertise in data science, empowering learners to thrive in today's data-driven world.
- Description
- Course Contacts
- Learning Outcomes
- Design
- Expectations
- Policies
- Schedule
- Folder Structure
- Acknowledgements
SQL is used across the machine learning pipeline, and a fundamental skill for data scientists to master. This course will focus on the technical skills needed for working with SQL, including flat-file datasets (JSON, CSV) ingestion, query design, and relational database management. Additionally, it will examine common data management concerns, data access management, and data privacy adherence.
Learners are first introduced to data modelling and how to think about the structure of databases. The majority of the course is devoted learning the necessary commands and their syntax to properly extract information from database. Throughout the entire course, students will learn how to problem solve through live coding.
- Instructor: Emails to Thomas (he/him) can be sent to thomas.rosenthal@utoronto.ca
- TAs: Emails to Ananya (she/her) can be sent to ananya.jha@mail.utoronto.ca. Additional support, Vishnou Vinayagame (he/him) vishnouvina@cs.toronto.edu
- Develop a better understanding of the structure of databases.
- Save and transport data in CSV and JSON file formats.
- Familiarity with the essentials of querying and manipulating data in SQL and how to search for future questions.
- Familiarity with the legal framework around sharing data.
- Analyze data requirements and work with different stakeholders such as analysts and managers.
The course runs synchronously over Zoom. It consists of six classes. Classes are 6 PM - 830 PM EST on Tuesday and Wednesday for select dates. Being mindful of online fatigue, there will be one or two breaks during each class where students are encouraged to stretch, grab a drink and snacks, or ask any additional questions.
Tutorials: sessions with both TAs will also be offered over Zoom. These will take place 30 minutes before and after each session.
Pedagogy: This course will use a dedicated Etherboard for student collaboration. The link will be provided in the first class. New content will be added by the instructor before each session. Each session will consist of slides to introduce topics, live coding to demonstrate the topics, and occasional breakout rooms/live polls to reinforce the topics.
The course is designed for both live-coding and corresponding slides. Students should follow along with the coding, interact with the study material SQL for Data Scientists by Renee Teate, and think about the broader data landscape. Students are encouraged to ask questions throughout.
- A computer with internet access is required to attend sessions and participate in live-coding.
- Download DB Browser for SQLite here: https://sqlitebrowser.org/dl/
- Downloading R and/or Python is not required but encouraged, please see the respective DSI repositories for instructions on installing these languages and your preferred IDE.
- Accessibility: We want to provide an accessible learning environment for all. If there is something we can do to make this course more accessible to you, please let us know.
- Course communications: Essential communications take place over email. Please include "DSI-SQL" or similar in the subject line.
- Camera: Keeping your camera on is welcomed, but optional.
- Microphone: Please keep microphones muted unless you need to speak. Questions are highly encouraged.
- Slack: Use the DSI Slack to keep up to date and ask and answer questions. Please be respectful and follow the code of conduct while communicating on Slack.
Lecture | Date | Topic | Slides |
---|---|---|---|
0 | Before First Lecture | Install & Pre-Class | Setup |
1 | Tues, February 20 6 - 8:30 PM EST |
Introduction, Data Modelling, Schema Design, Data Structures | Introduction |
2 | Wed, February 21 6 - 8:30 PM EST |
Building Queries: SELECT, FROM, WHERE, CASE, DISTINCT, JOINs | Building Queries |
3 | Tues, February 27 6 - 8:30 PM EST |
Essential Techniques: Aggregation Functions, Subqueries, Temporary Tables, CTEs, Datetime Functions | Essential Techniques |
4 | Wed, February 28 6 - 8:30 PM EST |
Advanced Techniques: NULL Management, Windowed Functions, String Manipulation, UNION & UNION ALL, INTERSECT & EXCEPT | Advanced Techniques |
5 | Tues, March 12 6 - 8:30 PM EST |
Expanding your Database: INSERT, UPDATE, DELETE, Importing & Exporting Data, CROSS & Self Joins, Views | Expanding your Database |
6 | Wed, March 13 6 - 8:30 PM AM EST |
Beyond SQL: Normal Forms, SQL and the Machine Learning pipeline, Broader Data Landscape, Reproducibility, Ethics, SQL in the wild | Beyond SQL |
Grading is pass/fail. There is only one assignment and six short (but mandatory) homeworks. Class Participation is 12% of the final grade. Submit via your Google Drive using lastname_firstname_SQL_HW#.
Assessment | Due Date | SQL | Weight |
---|---|---|---|
Assignment 1 | Sat, March 23, by 11:59 PM EST | Rubric | 40% |
Homework 1 PDF | Thurs, February 22, by 11:59 PM EST | ¯\_(ツ)_/¯ | 8% |
Homework 2 PDF | Sat, February 24, by 11:59 PM EST | Homework 2 SQL | 8% |
Homework 3 PDF | Sat, March 2, by 11:59 PM EST | Homework 3 SQL | 8% |
Homework 4 PDF | Sat, March 2, by 11:59 PM EST | Homework 4 SQL | 8% |
Homework 5 PDF | Thurs, March 14, by 11:59 PM EST | Homework 5 SQL | 8% |
Homework 6 PDF | Sat, March 16, by 11:59 PM EST | Homework 6 SQL | 8% |
Below are the folders contained in this repo with a description of what they contain and information on how to use them.
Slides were built in R with xaringan
. Output files are HTML.
PDF version of slides, using renderme
. Some minor differences may exist.
FarmersMarket.db, in-class SQL codes.
Data (csv, json, etc) we'll use for importing.
Homework to be done after each module.
Rubrics and assignment details.
First slides were originally developed by Faria Khandaker and Paul Hodgetts under the supervision of Rohan Alexander. Great suggestions were made by A Mahfouz and Lorena Almaraz. Slides have been created and modified by Thomas in 2022, 2023, and 2024.
Welcome! The course was created for the University of Toronto's Data Sciences Institute. It has been designed for individuals with no prior expertise in data science, empowering learners to thrive in today's data-driven world.
- Description
- Course Contacts
- Learning Outcomes
- Design
- Expectations
- Policies
- Schedule
- Folder Structure
- Acknowledgements
SQL is used across the machine learning pipeline, and a fundamental skill for data scientists to master. This course will focus on the technical skills needed for working with SQL, including flat-file datasets (JSON, CSV) ingestion, query design, and relational database management. Additionally, it will examine common data management concerns, data access management, and data privacy adherence.
Learners are first introduced to data modelling and how to think about the structure of databases. The majority of the course is devoted learning the necessary commands and their syntax to properly extract information from database. Throughout the entire course, students will learn how to problem solve through live coding.
- Instructor: Emails to Thomas (he/him) can be sent to thomas.rosenthal@utoronto.ca
- TAs: Emails to Ananya (she/her) can be sent to ananya.jha@mail.utoronto.ca. Additional support, Vishnou Vinayagame (he/him) vishnouvina@cs.toronto.edu
- Develop a better understanding of the structure of databases.
- Save and transport data in CSV and JSON file formats.
- Familiarity with the essentials of querying and manipulating data in SQL and how to search for future questions.
- Familiarity with the legal framework around sharing data.
- Analyze data requirements and work with different stakeholders such as analysts and managers.
The course runs synchronously over Zoom. It consists of six classes. Classes are 6 PM - 830 PM EST on Tuesday and Wednesday for select dates. Being mindful of online fatigue, there will be one or two breaks during each class where students are encouraged to stretch, grab a drink and snacks, or ask any additional questions.
Tutorials: sessions with both TAs will also be offered over Zoom. These will take place 30 minutes before and after each session.
Pedagogy: This course will use a dedicated Etherboard for student collaboration. The link will be provided in the first class. New content will be added by the instructor before each session. Each session will consist of slides to introduce topics, live coding to demonstrate the topics, and occasional breakout rooms/live polls to reinforce the topics.
The course is designed for both live-coding and corresponding slides. Students should follow along with the coding, interact with the study material SQL for Data Scientists by Renee Teate, and think about the broader data landscape. Students are encouraged to ask questions throughout.
- A computer with internet access is required to attend sessions and participate in live-coding.
- Download DB Browser for SQLite here: https://sqlitebrowser.org/dl/
- Downloading R and/or Python is not required but encouraged, please see the respective DSI repositories for instructions on installing these languages and your preferred IDE.
- Accessibility: We want to provide an accessible learning environment for all. If there is something we can do to make this course more accessible to you, please let us know.
- Course communications: Essential communications take place over email. Please include "DSI-SQL" or similar in the subject line.
- Camera: Keeping your camera on is welcomed, but optional.
- Microphone: Please keep microphones muted unless you need to speak. Questions are highly encouraged.
- Slack: Use the DSI Slack to keep up to date and ask and answer questions. Please be respectful and follow the code of conduct while communicating on Slack.
Lecture | Date | Topic | Slides |
---|---|---|---|
0 | Before First Lecture | Install & Pre-Class | Setup |
1 | Tues, February 20 6 - 8:30 PM EST |
Introduction, Data Modelling, Schema Design, Data Structures | Introduction |
2 | Wed, February 21 6 - 8:30 PM EST |
Building Queries: SELECT, FROM, WHERE, CASE, DISTINCT, JOINs | Building Queries |
3 | Tues, February 27 6 - 8:30 PM EST |
Essential Techniques: Aggregation Functions, Subqueries, Temporary Tables, CTEs, Datetime Functions | Essential Techniques |
4 | Wed, February 28 6 - 8:30 PM EST |
Advanced Techniques: NULL Management, Windowed Functions, String Manipulation, UNION & UNION ALL, INTERSECT & EXCEPT | Advanced Techniques |
5 | Tues, March 12 6 - 8:30 PM EST |
Expanding your Database: INSERT, UPDATE, DELETE, Importing & Exporting Data, CROSS & Self Joins, Views | Expanding your Database |
6 | Wed, March 13 6 - 8:30 PM AM EST |
Beyond SQL: Ethics, Normal Forms, SQLite in R & python | Beyond SQL |
Grading is pass/fail. There is only one assignment and six short (but mandatory) homeworks. Class Participation is 12% of the final grade. Submit via your Google Drive using lastname_firstname_SQL_HW#.
Assessment | Due Date | SQL | Weight |
---|---|---|---|
Assignment 1 | Sat, March 23, by 11:59 PM EST | Rubric | 40% |
Homework 1 PDF | Thurs, February 22, by 11:59 PM EST | ¯\_(ツ)_/¯ | 8% |
Homework 2 PDF | Sat, February 24, by 11:59 PM EST | Homework 2 SQL | 8% |
Homework 3 PDF | Sat, March 2, by 11:59 PM EST | Homework 3 SQL | 8% |
Homework 4 PDF | Sat, March 2, by 11:59 PM EST | Homework 4 SQL | 8% |
Homework 5 PDF | Sat, March 16, by 11:59 PM EST | Homework 5 SQL | 8% |
Homework 6 PDF | Sat, March 16, by 11:59 PM EST | (╯°□°)╯︵ ┻━┻ | 8% |
Below are the folders contained in this repo with a description of what they contain and information on how to use them.
Slides were built in R with xaringan
. Output files are HTML.
PDF version of slides, using renderme
. Some minor differences may exist.
FarmersMarket.db, in-class SQL codes.
Data (csv, json, etc) we'll use for importing.
Homework to be done after each module.
Rubrics and assignment details.
First slides were originally developed by Faria Khandaker and Paul Hodgetts under the supervision of Rohan Alexander. Great suggestions were made by A Mahfouz and Lorena Almaraz. Slides have been created and modified by Thomas in 2022, 2023, and 2024.
fe8a61321786d8c5fd013633f83ec325fb87a057