/coalesce-22-advanced-testing-workshop

This repository contains the project for the Coalesce 2022 Advanced Testong workshop.

๐Ÿ”ฌ Advanced Testing Workshop ๐Ÿ”ฌ

Welcome to the Advanced Testing workshop of Coalesce 2022!

๐Ÿฅผ About this Project ๐Ÿงช

You've been using dbt for a while, and are comfortable with the built in tests. Part of you wonders: "are there other tests I should/could be using?" or "what if I want to make my own tests?". You've come to the right place! In this workshop we'll be going through advanced testing within dbt, and how you can use it to improve the reliability of your project, and sleep better at night knowing that your code is clean.

โœ… Prerequisites:

This workshop assumes that you're familiar with dbt. At a minimun you should know how to:

  • Apply and run built-in tests
  • Run commands and compile code
  • Create branches
Live participants

For the workshop, you will be given access to the dbt Cloud account with all the necessary prerequisites.

All others
  1. A Repository
    Ideally, with the files and folders contained in this workshop. To make a copy, fork this repository.

  2. dbt
    Using dbt Cloud vs. dbt Core doesn't matter. You'll specifically want to know how to:

    To setup dbt:

  3. Some Data
    This project is written on top of BigQuery and uses the publicly available TPC-H data set. A truncated version of the data set has been included in this project as CSV files, located in the _resources folder.

    If you don't have some data or a warehouse yet, don't worry - the setup will guide you through setting up a free BigQuery account and loading the data for this project. Here are some resources to reference, just in case:

    Note:
    We don't suggest seeding the CSV files. Though they are truncated, they still contain a significant amount of rows.

๐Ÿงฐ Setup

Live participants
  1. Navigate to the Coalesce 2022 Workshop - Advanced testing account.

  2. Configure your development credentials:

    1. Click on your user profile in the top left-hand corner and click Profile Settings
    2. Scroll to the "Credentials" section.
    3. Click on Analytics
    4. Hit the Edit button in the lower right hand corner.
    5. Change these configurations:
    Dataset Set this to dbt_ your first initial + last name. Example: dbt_bregenold
    Target Name Set this to dev
    1. Hit Save
  3. Create a new branch named first initial + last name_coalesce_22. Example: bregenold_coalesce_22

  4. Run dbt deps to install dependencies.

  5. Confirm your setup:

    1. Navigate to the IDE by clicking on the Develop tab in the upper right-hand corner
    2. Try running the following commands:
    $ dbt run
    $ dbt test

    or alternatively:

    $ dbt build

    Don't worry when you see an error on stg_tpch__part_suppliers. Stop when you hit that error!

All others
  1. Fork this repository.

  2. Set up your dbt Project

    Important
    If you don't set up the BigQuery account and want to use another warehouse:

    • You'll need a warehouse - the warehouse is an essential connection in dbt.
    • You'll need to load the data to your selected warehouse using another method.
    • You'll need to make changes to the repository code you forked so the syntax works with your warehouse.
  3. Load the data

    Download the files from the _resources/tpch_dataset. If you are working locally, the files will be within the repository location on your computer.

    • If you set up a BigQuery account during setup, load the data:

      1. In the BigQuery UI's Explorer pane, click the three dots next to your project name
      2. Click Create dataset.
      3. For Dataset ID, type raw_tpch.
      4. Click Create dataset
      5. You should now see your dataset listed under your project name. Click the three dots next to the dataset.
      6. Click Create table
      7. Choose Upload as the Create table from option.
      8. Click Browse under Select file
      9. Upload each file you downloaded from the _resources/tpch_dataset folder:
        • For the table name, use the file name without the extension. Some file names have _100mb appended. Omit this.
        • Make sure to check Auto detect under Schema
    • If you didn't setup BigQuery, load the data from the _resources/tpch_dataset into your warehouse.
      You will need to update the _sources.yml file with the location of your data.

  4. Run dbt deps to install dependencies.

  5. Confirm your setup:
    Try running the following commands:

    $ dbt run
    $ dbt test

    or alternatively:

    $ dbt build

Don't worry when you see an error on stg_tpch__part_suppliers. When you hit this error you're ready to start the lesson!

 

๐ŸŽ‰ You're ready to move on to the next stage! ๐ŸŽ‰

Live participants:
We're asking that you don't go hopping in to the walkthrough just yet! We'll be training together live! ๐Ÿ’œ

Additional Helpful Links:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Join the dbt community to learn from other analytics engineers
  • Find dbt events near you
  • Check out the blog for the latest news on dbt's development and best practices