/dbt-databricks-demo

Demo project for dbt on Databricks

Apache License 2.0Apache-2.0

dbt + Databricks Demo!

This is a modified version of our public tutorial intended for users of dbt on Databricks.

Any questions? jeremy@fishtownanalytics.com

Sample data

Create Databricks tables jaffle_shop.orders, jaffle_shop.customers, and stripe.payments from these CSV files, which are located in a public S3 bucket (docs):

s3://dbt-tutorial-public/jaffle_shop_orders.csv
s3://dbt-tutorial-public/jaffle_shop_customers.csv
s3://dbt-tutorial-public/stripe_payments.csv

Getting started

The instructions below assume you are running dbt on macOS. Linux and Windows users should adjust the bash commands accordingly.

  1. Clone this github repo
  2. Install dbt-spark: pip install dbt-spark
  3. Copy the example profile to your ~/.dbt folder (created when installing dbt):
$ cp ./sample.profiles.yml ~/.dbt/profiles.yml
  1. Populate ~/.dbt/profiles.yml with your Databricks host, API token, cluster ID, and schema name
open ~/.dbt
  1. Verify that you can connect to Databricks
$ dbt debug
  1. Verify that you can run dbt
$ dbt run

Resources:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Join the chat on Slack for live discussions and support
  • Find dbt events near you
  • Check out the blog for the latest news on dbt's development and best practices
  • Watch our Office Hours on dbt + Spark