/course_advanced_dbt

This is the repository for Bingeflix dbt Project (used in the CoRise Advanced dbt course)

Welcome to the Bingeflix Data Team

First-Time Setup

Part 1: Setup Snowflake account

Use the information below to log in to the course Snowflake instance:

Here are some examples of how to derive your username from your email. sourabh@corise.com ➡ sourabhcorisecom sourabh.bajaj@corise.com ➡ sourabhbajajcorisecom Sourabh.BAJAJ.1234@gmail.com ➡ SourabhBAJAJ1234gmailcom

  • Password: ADD PASSWORD (you'll be prompted to change this, remember to save this password)
  • Once logged into Snowflake, ensure you have access to the TRANSFORMER role (each student will be using their own custom schema)

Part 2: Set up your GitHub repo

We'll be setting up our GitHub fork. Below are the steps you will need to take. Please review the visual walk-through for more detail.

Part 3: Setting up Gitpod

Next, you'll set up Gitpod using the repo fork you just created. Follow the steps below to set this up:

  • Go to gitpod.io and click Dashboard (or Login) in the top right corner
  • Click 'Continue with GitHub' to log in if you are not already logged in
  • Enter your GitHub credentials
  • Select the editor you'd like to use (VSCode in Browser is our default, but you'll see you can choose others if you prefer)
  • Click Continue to open the workspace (this may take a few minutes)
  • Explore the workspace
  • You'll need to "Pin" your workspace so that it doesn't get Archived after 14 days, and always "Stop" your workspace so that you don't run out of hours. See video on CoRise week 0 for how to do so.

Part 4: Install dbt inside of a virtual environment

We'll go over virtual environments in more detail during week 1, but for now, you can follow these instructions to get dbt installed:

  • Ensure you're in the root directory of the repo (/workspace/course_advanced_dbt)
  • Run touch Pipfile to create a blank Pipfile (this should not have a file extension)
  • Run open Pipfile. This will open the blank file in the editor window.
  • Copy and paste the following into the Pipfile:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
dbt-core = "==1.5.2"
dbt-snowflake = "==1.5.2"
sqlfluff = "==2.1.2"
sqlfluff-templater-dbt = "==2.1.2"
pre-commit = "==3.3.3"

[requires]
python_version = "3.11"
  • Run pip install pipenv to install the pipenv tool
  • Run pipenv install to create the virtual environment and install the packages inside of the Pipfile (this may take a few moments, you should see installation logs in your terminal)
  • Run pipenv shell to activate the virtual environment
  • Run dbt --version to confirm that dbt has been installed properly
  • Important: you will need to activate the virtual environment each time you open Gitpod by running pipenv shell. dbt is installed inside of this virtual environment, so if it is not activated, you will not have access to run dbt.

Part 5: Setup your profiles.yml file

  • Run cd .. to move up from the project root folder to the /workspace directory
  • Run ls -a to list the contents of this folder, including hidden folders–you should see a folder in the list called .dbt (if you don't, run mkdir .dbt to create it)
  • Run cd .dbt to open this hidden folder
  • Run touch profiles.yml to create a new profiles.yml file
  • Run open profiles.yml to open the file you just created.
  • Copy and paste the following credentials into the new profiles.yml file you just created:
advanced_dbt:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: efruuit-zh04827
      user: <ADD YOUR SNOWFLAKE USER NAME>
      password: <ADD YOUR SNOWFLAKE PASSWORD>
      role: transformer
      database: dev
      warehouse: transforming
      schema: dbt_<YOUR SNOWFLAKE USERNAME>
      threads: 4

Run cd /workspace/advanced_dbt to navigate to the folder that contains the dbt_project.yml file

Validate that dbt can connect with Snowflake by running dbt debug

Awesome! You are now ready to start the course! 🚀🚀🚀

Coding Conventions

General

  • Use UPPER case for all keywords
  • Use trailing commas in SELECT statements
  • Use Snowflake dialect
  • Use consistent style in GROUP BY and ORDER BY (either names or numbers, not both)

Testing Conventions

Sources

  • The primary key source column must have not_null and unique generic tests.
  • All boolean columns must have an accepted_values schema test. The accepted values are true and false.
  • Columns that contain category values must have an accepted_values schema test.
  • Columns that should never be null must have a not_null schema test.
  • Columns that should be unique must have a unique schema test.

Models

  • The primary key column must have not_null and unique schema tests.
  • All boolean columns must have an accepted_values schema test. The accepted values are true and false.
  • Columns that contain category values must have an accepted_values schema test.
  • Columns that should never be null must have a not_null schema test.
  • Columns that should be unique must have a unique schema test.
  • Where possible, use schema tests from the dbt_utils or dbt_expectations packages to perform extra verification.