Awesome dbt
Welcome to the awesome curated list of dbt resources!
Any kind of contribution is greatly encouraged and appreciated. For making a contribution, please check the contribution guidelines first! Add new entries on the top of sections (LIFO) to keep fresh items more visible! Also, feel free to add new sections.
Happy contributing!
Contents
- Get Started
- How To
- Integrations
- User Stories
- Data Quality
- CI/CD
- Orchestration
- Utilities
- Packages
- Community
- Sample Projects
- Contributors
Get Started
Courses from where you can get started with Analytics Engineering.
- The Ultimate Guide to dbt - A comprehensive canvas guide to dbt, from the basics to advanced topics.
- dbt in a real world scenario, A Beginner dbt tutorial - A beginner tutorial to understand dbt with a real world example.
- Mastering dbt: Beginner to Pro - Paid Udemy course that covers theory, building a dbt project from scratch, and deploying to dbt Cloud.
- Analytics Engineering Glossary - Living collection of terms & concepts commonly used in the data industry by dbt Labs.
- Zero to Hero dbt - Complete course covering both theory & practice through real-world Airbnb use-case.
- Data Engineering Zoomcamp - Data engineering course on cutting edge tools including dbt.
- Analytics Engineering with dbt - Paid course offered by co:rise covering the basics of dbt.
- dbt Fundamentals - Official free course offered by dbt. Excellent for learning the basics of dbt Cloud.
- Refactoring SQL for Modularity - Another dbt labs offered free course on dbt refactoring and CTE supercharging.
- Learn DBT from Scratch - Guides you through a setup paired with Snowflake (decorated with extras).
How To
Helping hand on setting up integrations and implementing best practices.
- Discovery API use-cases - Use-cases and examples for the dbt Cloud Discovery API.
- dbt Docs as a Static Website - How to deploy dbt docs as a static website with App Engine and GitHub Actions.
- dbt Monorepo Workflow - How to get started with the team dbt workflow.
- Configuring Snowflake warehouse sizes in dbt - How to use dbt with Snowflake to allow specific warehouses to be chosen down to the model level.
- BigQuery Ingestion-Time Partitioning and Partition Copy With dbt - Combining ingestion-time partitioning and partition copy is a great way to achieve better performance for your models.
- Power up your data quality with grouped checks - How to use grouped checkes in dbt-utils to keep our data "on track".
- Dry running our data warehouse using BigQuery and dbt - Use dbt & BigQuery dry run jobs to validate our 1000+ models in under 30 seconds.
- Automatically generate ERD - Automatically generate ERDs and display in your docs site.
- Business Intelligence Standards - Best practices in Business Intelligence standards for integrating with dbt.
- Jinja cheatsheet - Jinja cheatsheet for dbt development.
- Test SQL Pipelines against Production Clones using DBT and Snowflake - Leverage Snowflake Zero-copy-clones to run slim ci checks.
- How we structure our dbt projects - How the dbt team structures its dbt projects.
- How we structure our dbt projects - How the dbt team structures its dbt projects.
- dbt guide - Primer on how you should properly set up and configure your dbt workflow.
- dbt for Data Transformation – Hands-on - Yet another tutorial for using dbt Cloud.
- Start Modeling Data - Configuring Bigquery with your dbt project.
- Accelerating Data Teams with dbt & Snowflake - A dbt & Snowflake workshop on financial data.
- Creating a dev environment quickly on Snowflake - Setting up teh integraton with Snowflake.
- How to set up a dbt data-ops workflow, using dbt cloud and Snowflake - Leverage GitHub Actions to set up CI/CD with dbt Core.
- How to configure your dbt repository - Mono-repo or not mono-repo?
- Best Practices for Optimizing Your dbt and Snowflake Deployment - Pocket guide on optimization best practices with Snowflake.
- How to Deploy dbt to Production using GitHub Actions
- Doing More With Less: Using DBT to load data from AWS S3 to Snowflake via External Tables - An alternative guide to set up your dbt-external-tables workflow.
- Best Practices for your dbt Style Guide - Standards for well organized base layer with Airbyte ingestion.
- Tips and Tricks about working with dbt - Tips from community members.
- Best Practices for your dbt Style Guide - Standards for well organized base layer with Airbyte ingestion.
- Writing Unit Tests for dbt - An overview about the package dbt-unit-testing.
Integrations
Collection of known data integrations with dbt
- Datafold - Gives a quick print out summary of changes so you can move fast and (not) break stuff!
- Raycast dbt Metadata - Queries the dbt Cloud API to return some useful information about your models (number of tests, time they took to run etc…).
- Cube - APIs, Caching, and Access Control on top of dbt Metrics.
- FlexIt Analytics - Business Intelligence platform with deep dbt Cloud and CLI integration.
- Raycast dbt Jobs - Raycast integration to monitor dbt Cloud Jobs.
- Metaplane - Data Observaibility layer on top of your dbt + BI project.
- Dbt + Machine Learning: What makes a great baton pass? - Landscape of ML utilities around dbt.
- Soda - Integration of Soda's data observability platform and dbt.
- Supported Adapters - Offically supported database adapters.
- Lightdash - Open source Looker alternative with deep dbt integration.
- Superset - Open source visualization layer for your Modern Data Stack.
- Dagster and dbt: Better Together - Getting started with the dagster-dbt library.
- fal - Add multi-language support (Python) to your dbt project.
- Privacy Dynamics - Anonymize data in your dbt project.
- prefect-dbt - Collection of Prefect integrations for working with dbt with your Prefect flows.
User Stories
Use-cases and user stories implemented by the community members using components of the MDS with dbt.
- How HomeToGo connected dbt and Superset to make metadata more accessible and reduce analytical overhead - A dbt<>Superset connector that leverages Superset's API capabilities and dbt's manifest.
- Self-service Business Intelligence - Eliminate the need for a data modeling semantic layer in BI.
- "Semantic-free" is the future of Business Intelligence - How to leverage dbt as a data catalog and semantic layer (joins, synonyms, etc.) that BI tools can just plug into.
- Building an extension framework for dbt - How Monzo built an extension framework for dbt.
- Why I moved my dbt workloads to GitHub and saved over $65,000 - Save by replacing dbt Cloud with GitHub Actions.
- “Is This You?” Entity Matching in the Modern Data Stack with Large Language models - An experiment in productionizing LLMs.
- How HomeToGo connected dbt and Superset to make metadata more accessible and reduce analytical overhead - A dbt<>Superset connector that leverages Superset's API capabilities and dbt's manifest.
- Self-service Business Intelligence - Eliminate the need for a data modeling semantic layer in BI.
- Leveraging DBT as a Data Modeling tool - Reflection on one-year usage of dbt.
- dbt + Materialize: Streaming to a dbt project near you - How to own your real-time transformation workflows like batch-based alternatives.
- Who's really using dbt? - Behind the community of analytics engineers.
- dbt and the Analytics Engineer — what's the hype about - Behind the upheaval of the analytics engineer profession.
- Analyzing Fishtown's dbt project performance with artifacts - Using project artifacts to identify anomalies and room for refactoring.
- Deploying and Running dbt on Azure Container Instances - Demonstration of integration with Azure.
- Beware of DBT Incremental Updates Against Snowflake External Tables - Things you should be aware of when using external tables with dbt.
- dbt development at Vimeo - Best practises from the Vimeo Data team.
Data Quality
Best-practices and extensions of the testing framework.
- dq-tools - Make simple storing test results and visualisation of these in a BI dashboard leveraging 6 Data Quality KPIs.
- BigQuery Stale data detection - Stale data detection with dbt and BigQuery dataset metadata.
- PipeRider - PipeRider allows you to define the shape of your data once, and then use the data checking functionality to alert you to changes in your data quality.
- Elementary - A dbt package that provides data anomaly detection as dbt tests.
- Environment-dependent Unit Testing in dbt - Guide on how to run unit tests in dbt dynamically.
- dbt-expectations - Port between dbt and great_expectations to extend out-of-the-box tests.
- re_data - A dbt package for montioring metrics and detect anomalies.
- How do you test your data - Suggestions on testing your data powered by the community.
- How to unit test sql transforms in dbt - Unit test using source defer and generic custom tests.
CI/CD
Make the best out of your product quality and seamless delivery.
- Autoscaling CI - The intelligent Slim CI.
- Slim CI/CD with Bitbucket Pipelines - How to setup slim CI on Bitbucket.
- dbt-docs-to-notion - A GitHub action for exporting dbt model docs to a Notion database.
- How to review an analytics pull request - Checkpoints to consider when reviewing an analytics engineer PR.
- Continuous Integration and Automated Build Testing with dbtCloud - Great and detailed blogpost on setting up Slim CI in dbt Cloud.
- How to review an analytics pull request - Checkpoints to consider when reviewing an analytics engineer PR.
- Performing a blue/green deploy of your dbt project on Snowflake - A very tidy and fail-safe way to run dbt in production by using two parallel production enviromnents.
- How we speed up our CI runs by 10x using Slim CI - Limit data in long-running CI checks to improve developing experience.
Orchestration
Resources to manage and maintain dependencies in modern data pipelines.
- Building a Scalable Analytics Architecture with Airflow and dbt - Leveraging the dbt manifest in Airflow.
- Auto-generating an Airflow DAG using the dbt manifest - Yet another article on extracting value from the manifest file.
- Building a robust data pipeline with the dAG stack: dbt, Airflow, Great Expectations - Demonstration of a data orchestration project with Airflow.
- Run dbt in Azure Data Factory - Primer about dbt on Azure Data Stack.
Utilities
Useful tools and extensions to bump up your analytics engineer worklow.
- Jinjat - Low-code application framework that turns your dbt projects into web apps.
- fst: flow state tool - A tool to help you stay in flow state while developing dbt models.
- dbt_tld - A self-updating dbt library that will maintain a list of current IANA/ICANN recognized top level domains.
- dbt-model-finder - A Streamlit web app to find currently running dbt models.
- dbtc Explorer - A Streamlit web app to explore the dbt Cloud API.
- dbt-feature-flags - Feature Flags in dbt models.
- dbtpal - A Neovim plugin for dbt model editing.
- cookiecutter-dbt - Cookiecutter template for dbt projects.
- turbovault4dbt - TurboVault4dbt is an open source tool that automatically generates dbt models according to datavault4dbt-templates.
- dbtvault-generator - Generate DBT Vault files from yml metadata (supporting
dbtvault
package). - dbt-container-skeleton - All the basics to get a nice containerized dbt development environment.
- oliver-twist - DAG auditing tool that audits the DBT DAG and generates a summary report.
- dbt-sql-formatter - Makes your sql less bad.
- dbterd - CLI to generate DBML file from dbt manifest.json.
- dbt-cue - Generate dbt yml files using the CUE language.
- VSC - Wizard for dbt Core - This extension accelerates your first-time environment setup with dbt Core, and optimizes your continual development of transformation pipelines.
- dbt-artifacts-parser - It enables us to deal with catalog.json, manifest.json, run-results.json and sources.json as python objects.
- GitHub Action: Cancel Running CI Job - This allows to always have the newest code commit running in the CI job without having to wait for the stale job runs to finish.
- dbtc - Unaffiliated python interface to various dbt Cloud API endpoints.
- dbt-osmosis - Enhance the developer experience significantly with workbench, output diffs, and YAML management.
- pytest-dbt-core - Pytest dbt core is a pytest plugin for testing your dbt projects.
- looker-gen - Generate lookml from dbt.
- dbtenv - A version manager for dbt.
- sqlfmt - This tool formats your dbt SQL code so you don't have to.
- SQLFluff - SQL linter that supports dbt and Jinja templating.
- Build Data Access Layer on dbt - Package to build GraphQL API on top of your dbt project.
- Run changed models based on Git status - Handy bash function to run changed models since last commit.
- How we set up our computers for working on dbt projects - Things I wish I would have known when started working with dbt. Tools and hacks to improve developing experience.
- fzf-dbt - Search dbt models interactively from terminal.
- vscode-dbt-power-user - VSCode extension to give more clarity on model dependencies.
- Your Essential dbt Project Checklist - Checklist on items necessary for a successful dbt project.
- dbt Style Guide - Developing styleguide often referred in PR templates.
- Clean your warehouse of old and deprecated models - Clean out warehouse models which are not existent in the project.
- dbt-tips - Excellent companion to your dbt practice with rich collection of tips.
- dbt-tags - Understanding the scopes of dbt tags.
- Pre-commit hooks - Pre-commit hooks for checking data integity before schema change commit.
Packages
Community-developed packages to extend default macros and toolset.
- dbt-census-utils - A collection of dbt macros for working with Census data.
- dbt-fabric - A dbt adapter for working with Microsoft Fabric Data Warehouses.
- dq-vault - Data Quality Observation of Data Vault layer.
- dbt-translate - Translate numbers into words.
- dbt-excel - A dbt adapter for working with Excel.
- dbt_linreg - Linear regression in SQL using dbt.
- dbt-snowflake-query-tags - Automatically tag dbt-issued queries with informative metadata.
- snowflake-resource-monitoring - Yet another package to monitor Snowflake usage.
- usagedata - Provides insights on the database/table level usage informations from Snowflake.
- dbt_ml - Package for dbt that allows users to train, audit and use BigQuery ML models.
- ddbt - This repo represents my attempt to build a fast version of DBT which gets very slow on large projects (3000+ data models). This project attempts to be a direct drop in replacement for DBT at the command line.
- dbt-snowflake-monitoring - A dbt package to help you monitor Snowflake performance and costs.
- datavault4dbt - Macros for staging and creation of all DataVault-Entities you need, to build your own DataVault2.0 solution.
- DDO - Perform DataOps & administrative CI/CD on your data warehouse.
- dbt-yaml-check - Checks that columns defined in YAML also exist in SQL.
- data-diff - A command-line tool and Python library to efficiently diff rows across two different databases.
- dbt-project-evaluator - This package highlights areas of a dbt project that are misaligned with dbt Labs' best practices.
- dbt_constraints - Generate database constraints based on the tests in a dbt project.
- dbt-date - Date logic and calendar functionality.
- dbt-privacy - Macros to make it easier to protect your customers' data.
- dbt-fivetran-utils - General macros and helpers.
- dbt_metrics - Macros to support secondary calculations and generate business metrics.
- dbt-metabase - Model synchronization from dbt to Metabase.
- dbt-coves - CLI tool for generating a scaffold for your dbt project.
- dbt-profiler - Data profiling and doc block generator.
- dbt_utils - General macros library. A must have.
- dbt_audit_helper - Macros for data audits that compare columns values and schemas between tables.
- dbt-ml-preprocessing - A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
- dbt-external-tables - Macros to stage your external sources.
- dbt-feature-store - Macros to build a feature store right within your dbt project.
- dbt-codegen - Macros that generate dbt code, and log it to the command line.
- dbt-init - Create a project and populate as much of the dbt project as possible.
- dbt-artifacts - This package builds a mart of tables from dbt artifacts loaded into a table.
- dbt-erdiagram-generator - This packages generate ERD diagrams from a dbt project.
- Terraform-dbt Cloud Module - IAC in dbt Cloud via Terraform.
- dbt2looker - Generate Looker views for dbt models.
- dbt-coverage - Checks dbt docs and tests coverage.
- dbt-meta-testing - Yet another coverage testing.
- dbt-superset-lineage - Push and pull metadata between dbt to Superset.
- dbtvault - Package for generating and executing ETL for Data Vault 2.0.
- dbt-invoke - CLI for creating, updating, and deleting dbt property files.
- dbt-unit-testing - Package which contains macros to support unit testing.
Community
Conferences, meetups, dicussions, newsletters, podcasts, etc. led by fellow analytics engineers and forums of contact.
- Data Council Austin 2023 - A conference for data teams.
- State of Analytics Engineering 2023 - A survey of pains, gains, and areas of investment for global data teams.
- dbt Labs Tiktok - Official TikTok channel of dbt Labs.
- Locally Optimistic - A Slack community of aspiring analytics leaders discussing and sharing lessons learned and challenges from their experiences in using data.
- DataTalks.Club - Global online community of data enthusiasts. Podcasts and blogs, etc. are distributed with high frequency.
- Metadata Weekly - Weekly substack about metadata, the metrics layer and MDS.
- Data & Analytics Events in 2022 - Great curated list of upcoming data analytics conferences.
- Data Council Austin 2022 - Worldwide community driven analytics conference with a handful of talks fitting to the dbt stack.
- Discourse v2 - Revamped and ported hub of main discussions for the community.
- Coalesce 2021 - Second iteration of the analytics engineer conference.
- Coalesce 2020 - Annual dbt conference full of fascinating use-cases.
- dbt meetups - List of community led dbt meetups.
- Analytics Engineer Roundup - Official dbt Labs newsletter on topics of the MDS.
- Benn Stacil's Newsletter - Tought-provoking reads from founder of Mode.
- Data Engineering Weekly - Weekly newsletter of recent trends in Data Engineering.
- Data Engineering Podcast - One of the most popular data engineering podcasts covering great concepts and new products.
- Analyitics Engineer Podcast - Official podcast of dbt Labs.
- dbt Slack - Energy-filled hub of analytics engineers (Highly recommended).
- r/dataengineering - Subreddit of data engineering topics.
- Drill to Detail Podcast - Special guests discussing big data, business intelligence, modern data stack.
Sample Projects
Sample projects which work out-of-the box. Reflect use-cases publicly available.
- dbt_workspace - A workspace template for dbt demos.
- Cloud Cost Monitoring - A dbt project to monitor cloud costs.
- Analytics Engineer Survey 2023 - Repo containing data and dbt template of the survey.
- Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery - Explore the topic of fake GitHub stars.
- Data-aware orchestration - Dagster's ability to create a global dependency graph between different dbt projects.
- GitLab Data Team - Gitlab's open source dbt project.
- attribution-playbook - A worked example to demonstrate how to model customer attribution.
- mrr-playbook - A worked example to demonstrate how to model subscription revenue.
- Use dbt inside Visual Studio Code development containers - Set up your dbt environment with pre-installed extensions.
- dag-stack - Dbt-Airflow-GreatExpectations Stack.
- Jaffle Shop - A self-contained dbt project for testing purposes.
- Spotify User Analytics - Sample dbt project with Spotify user data.
- dbt-github-workflow - Deploy BigQuery + Airflow.
- airflow-dbt-demo - Demonstration of Airflow integration.
- aws athena x dbt - How to build a small and modern data infrastructure.
- dbt on AWS - Data Build Tool (dbt) for Effective Data Transformation on AWS
Contributors
Thanks for all the great resources! Can't see your avatar? Check the contribution guide on how you can submit your resources to the community!