/GSynergyDEChallenge

Primary LanguagePythonApache License 2.0Apache-2.0

GSynergy Data Engineering Challenge

Welcome! This project utilizes dbt (data build tool) to transform raw data into analytics-ready tables in your data warehouse. Below, you'll find information on how to set up and use this project effectively.

Project Structure

  • models: Contains dbt models that define transformations on your raw data.
  • tests: Contains dbt tests to ensure data quality and integrity.
  • analysis: Contains SQL files for ad-hoc analysis and reporting.
  • macros: Contains reusable dbt macros for common tasks.
  • snapshots: Contains dbt snapshots for capturing point-in-time data.

Setup Instructions

The instructions below assume you are running dbt on macOS. Linux and Windows users should adjust the bash commands accordingly.

  1. Clone this github repo
  2. Install dbt-spark: pip install dbt-spark
  3. Copy the example profile to your ~/.dbt folder (created when installing dbt):
$ cp ./sample.profiles.yml ~/.dbt/profiles.yml
  1. Populate ~/.dbt/profiles.yml with your Databricks host, API token, cluster ID, and schema name
open ~/.dbt
  1. Verify that you can connect to Databricks
$ dbt debug
  1. Verify that you can run dbt
$ dbt run