👑 Inherited Project Refactoring Workshop 👑

Welcome to the Inherited Project Refactoring workshop of Coalesce 2022!

🧙 About this Project 🐱

You've just started working at a new job and they've been using dbt to transform their data (YES!). However, once you've made it into their project, you realize that you can't make heads or tails of their data flows. Let's be honest - it's a messy project and you desperately want to form a plan for cleaning it up.

dbt's already got some great material out there regarding refactoring (if you want to start with hands-on-code refactoring, you can start with the refactoring course), but the road to get from where you are to where you need to be can be hard navigate unless you've traveled it numerous times. This refactoring workshop focuses on the roadmap for refactoring, and is the perfect start to getting that extra practice in!

Part of Christine and Lauren's day-to-day is taming those crazy projects - they've created this workshop to teach you the art of planning, show you some tips and tricks, and give you some leveling-up advice for wrangling those DAGs!

✅ Prerequisites:

This workshop assumes that you're familiar with dbt. At a minimum you should know how to:

Generate documentation
Run commands and compile code
Create branches

Live participants

For the workshop, you will be given access to the dbt Cloud account with all the necessary prerequisites.

All others

A Repository
Ideally, with the files and folders contained in this workshop. To make a copy, fork this repository.
dbt
Using dbt Cloud vs. dbt Core doesn't matter. You'll specifically want to know how to:
- install packages
- generate and view documentation
- use selection syntax
- upgrade your dbt version, if needed (This project uses v1.3)
To setup dbt:
- dbt Cloud Setup
- dbt Core Setup
Some Data
This project is written on top of BigQuery and uses the publicly available TPC-H data set. A truncated version of the data set has been included in this project as CSV files, located in the _resources folder.

If you don't have some data or a warehouse yet, don't worry - the setup will guide you through setting up a free BigQuery account and loading the data for this project. Here are some resources to reference, just in case:
- Instructions for setting up a free BigQuery account
- Instructions for loading CSV files into BigQuery
- Starter instructions for accessing the TPC-H dataset yourself
Note:
We don't suggest seeding the CSV files. Though they are truncated, they still contain a significant amount of rows.

🧰 Setup

Live participants

Navigate to the Coalesce 2022 Workshop - Refactoring dbt Cloud account.
Configure your development credentials:
1. Click on your user profile in the top left-hand corner and click Profile Settings
2. Scroll to the "Credentials" section.
3. Click on Analytics
4. Hit the Edit button in the lower right hand corner.
5. Change these configurations:
Dataset Set this to dbt_ your first initial + last name. Example: dbt_cberger

Target Name Set this to dev
1. Hit Save
Run dbt deps to install dependencies.
Confirm your setup:
1. Navigate to the IDE by clicking on the Develop tab in the upper right-hand corner
2. Try running the following commands:
```
$ dbt run
$ dbt test
```
or alternatively:
```
$ dbt build
```


Dataset	Set this to `dbt_` your first initial + last name. Example: `dbt_cberger`
Target Name	Set this to `dev`

All others

Fork this repository.
Set up your dbt Project
- dbt Cloud Setup
- dbt Core Setup
Important
If you don't set up the BigQuery account and want to use another warehouse:
- You'll need a warehouse - the warehouse is an essential connection in dbt.
- You'll need to load the data to your selected warehouse using another method.
- You'll need to make changes to the repository code you forked so the syntax works with your warehouse.
Load the data

Download the files from the _resources/tpch_dataset. If you are working locally, the files will be within the repository location on your computer.
- If you set up a BigQuery account during setup, load the data:
  1. In the BigQuery UI's Explorer pane, click the three dots next to your project name
  2. Click Create dataset.
  3. For Dataset ID, type raw_tpch.
  4. Click Create dataset
  5. You should now see your dataset listed under your project name. Click the three dots next to the dataset.
  6. Click Create table
  7. Choose Upload as the Create table from option.
  8. Click Browse under Select file
  9. Upload each file you downloaded from the _resources/tpch_dataset folder:
    - For the table name, use the file name without the extension. Some file names have _100mb appended. Omit this.
    - Make sure to check Auto detect under Schema
- If you didn't setup BigQuery, load the data from the _resources/tpch_dataset into your warehouse.
  You will need to update the _sources.yml file with the location of your data.
Run dbt deps to install dependencies.
Confirm your setup:
Try running the following commands:
```
$ dbt run
$ dbt test
```
or alternatively:
```
$ dbt build
```

🎉 You're ready to move on to the next stage! 🎉

Whoa... whoa there! You can't just go slinging at the DAG like that. Here's a walkthrough to get you trained up!

Live participants:
We're asking that you don't go hopping in to the walkthrough just yet! We'll be training together live! 💜

Additional Helpful Links: