Welcome to the Advanced Testing workshop of Coalesce 2022!
You've been using dbt for a while, and are comfortable with the built in tests. Part of you wonders: "are there other tests I should/could be using?" or "what if I want to make my own tests?". You've come to the right place! In this workshop we'll be going through advanced testing within dbt, and how you can use it to improve the reliability of your project, and sleep better at night knowing that your code is clean.
This workshop assumes that you're familiar with dbt. At a minimun you should know how to:
- Apply and run built-in tests
- Run commands and compile code
- Create branches
Live participants
For the workshop, you will be given access to the dbt Cloud account with all the necessary prerequisites.
All others
-
A Repository
Ideally, with the files and folders contained in this workshop. To make a copy, fork this repository. -
dbt
Using dbt Cloud vs. dbt Core doesn't matter. You'll specifically want to know how to:- install packages
- use built-in tests
- use selection syntax
- upgrade your dbt version, if needed (This project uses v1.3)
To setup dbt:
-
Some Data
This project is written on top of BigQuery and uses the publicly available TPC-H data set. A truncated version of the data set has been included in this project as CSV files, located in the _resources folder.If you don't have some data or a warehouse yet, don't worry - the setup will guide you through setting up a free BigQuery account and loading the data for this project. Here are some resources to reference, just in case:
- Instructions for setting up a free BigQuery account
- Instructions for loading CSV files into BigQuery
- Starter instructions for accessing the TPC-H dataset yourself
Note:
We don't suggest seeding the CSV files. Though they are truncated, they still contain a significant amount of rows.
Live participants
-
Navigate to the
Coalesce 2022 Workshop - Advanced testing
account. -
Configure your development credentials:
- Click on your user profile in the top left-hand corner and click
Profile Settings
- Scroll to the "Credentials" section.
- Click on
Analytics
- Hit the
Edit
button in the lower right hand corner. - Change these configurations:
Dataset Set this to dbt_
your first initial + last name. Example:dbt_bregenold
Target Name Set this to dev
- Hit
Save
- Click on your user profile in the top left-hand corner and click
-
Create a new branch named first initial + last name_coalesce_22. Example:
bregenold_coalesce_22
-
Run
dbt deps
to install dependencies. -
Confirm your setup:
- Navigate to the IDE by clicking on the
Develop
tab in the upper right-hand corner - Try running the following commands:
$ dbt run $ dbt test
or alternatively:
$ dbt build
Don't worry when you see an error on
stg_tpch__part_suppliers
. Stop when you hit that error! - Navigate to the IDE by clicking on the
All others
-
Set up your dbt Project
Important
If you don't set up the BigQuery account and want to use another warehouse:- You'll need a warehouse - the warehouse is an essential connection in dbt.
- You'll need to load the data to your selected warehouse using another method.
- You'll need to make changes to the repository code you forked so the syntax works with your warehouse.
-
Load the data
Download the files from the
_resources/tpch_dataset
. If you are working locally, the files will be within the repository location on your computer.-
If you set up a BigQuery account during setup, load the data:
- In the BigQuery UI's
Explorer
pane, click the three dots next to your project name - Click
Create dataset
. - For
Dataset ID
, typeraw_tpch
. - Click
Create dataset
- You should now see your dataset listed under your project name. Click the three dots next to the dataset.
- Click
Create table
- Choose
Upload
as the Create table from option. - Click
Browse
underSelect file
- Upload each file you downloaded from the
_resources/tpch_dataset
folder:- For the table name, use the file name without the extension. Some file names have
_100mb
appended. Omit this. - Make sure to check
Auto detect
under Schema
- For the table name, use the file name without the extension. Some file names have
- In the BigQuery UI's
-
If you didn't setup BigQuery, load the data from the
_resources/tpch_dataset
into your warehouse.
You will need to update the_sources.yml
file with the location of your data.
-
-
Run
dbt deps
to install dependencies. -
Confirm your setup:
Try running the following commands:$ dbt run $ dbt test
or alternatively:
$ dbt build
Don't worry when you see an error on stg_tpch__part_suppliers
. When you hit this error you're ready to start the lesson!
Live participants:
We're asking that you don't go hopping in to the walkthrough just yet! We'll be training together live! ๐
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the dbt community to learn from other analytics engineers
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices