Welcome! This project utilizes dbt (data build tool) to transform raw data into analytics-ready tables in your data warehouse. Below, you'll find information on how to set up and use this project effectively.
- models: Contains dbt models that define transformations on your raw data.
- tests: Contains dbt tests to ensure data quality and integrity.
- analysis: Contains SQL files for ad-hoc analysis and reporting.
- macros: Contains reusable dbt macros for common tasks.
- snapshots: Contains dbt snapshots for capturing point-in-time data.
The instructions below assume you are running dbt on macOS. Linux and Windows users should adjust the bash commands accordingly.
- Clone this github repo
- Install dbt-spark:
pip install dbt-spark
- Copy the example profile to your
~/.dbt
folder (created when installing dbt):
$ cp ./sample.profiles.yml ~/.dbt/profiles.yml
- Populate
~/.dbt/profiles.yml
with your Databricks host, API token, cluster ID, and schema name
open ~/.dbt
- Verify that you can connect to Databricks
$ dbt debug
- Verify that you can run dbt
$ dbt run