/siren_demo

Primary LanguageDockerfile

Siren demo

  • Clickhouse (Database)
  • dbt (Transformations)
  • Cube (Metrics) + Metabase (Business Intelligence)

Table of Contents

  1. Configure environment
  2. Install dbt packages
  3. Configure data sources
  4. Build staging layer models
  5. Build intermediate layer models
  6. Build a data mart
  7. Set up metrics layer with Cube
  8. Visualize metrics

Configure environment

  1. Install prerequisites:

  2. Fork & Clone this repository and open in IDE

  3. Initialize ENV variables with .env file

    .env is used to store secrets as environment variables.

    Copy template file .env.template to .env file:

    cp .env.template .env

    Open file in editor and set your own values.

    ❗️ Never commit secrets to git

  4. Spin up Docker containers

    All the services are configured via Docker containers.

    • devcontainer
    • Clickhouse
    • Metabase
    • Cube
    # build dev container
    devcontainer build .
    
    # open dev container
    devcontainer open .

  5. Verify you are in a development container by running commands:

    dbt --version

    If any of these commands fails printing out used software version then you are probably running it on your local machine not in a dev container!

Install dbt packages

  1. Install modules via packages.yml

    dbt clean # clean temp files
    dbt deps # install dependencies (modules)

    We will use dbt-labs/dbt_utils package for this project.

  2. Turn on custom schema management

    I use generate_schema_name.sql macro to set target schema names:

    Renders a schema name given a custom schema name. In production, this macro will render out the overriden schema name for a model. Otherwise, the default schema specified in the active target is used.

    Take a look at custom_schema_management.sql macro to find out more.

Configure data sources

  1. Create table from Parquet file with source data:

    dbt run-operation init_source_data

    Initialized myBI source database

  2. Seed .csv files with data (CRM snapshots):

    dbt seed

Build DWH

DWH is structured like this for simplicity:

  siren_demo:
    staging:
      +schema: staging
      +tags: ['staging']
    intermediate:
      +schema: intermediate
      +tags: ['intermediate']
    marts:
      +schema: analytics
      +tags: ['analytics']

Build staging layer models

  • Ensure data quality with expectations testing with staging.yml
  • Generate surrogate keys {{ dbt_utils.generate_surrogate_key(['phone_hash', 'email_hash']) }} as lead_hash
  • Union all .csv data with dbt_utils.union_relations()
  • Remove header rows ENTRYDATE not in ('-----')
  • Parse timestamps parseDateTimeBestEffortOrNull(ENTRYDATE) AS entry_dt
  • Coalesce data COALESCE(CITY, CityName, trim(splitByChar('|', assumeNotNull(location))[1]))
dbt build -s tag:staging

Build staging models

Build intermediate layer models

  • Set Clickhouse table engine to ReplacingMergeTree()
  • Force deduplicate rows with post-hook post_hook=['OPTIMIZE TABLE {{ this }} FINAL DEDUPLICATE BY lead_hash, entry_dt, appt_dt']
  • Ensure data quality with expectations testing with intermediate.yml
dbt build -s tag:intermediate

Build intermediate models

Build a data mart

  • Join data from backend and customer CRM
  • Ensure data quality with expectations testing with marts.yml
dbt build -s tag:analytics

Build a data mart

Set up metrics layer with Cube

Visualize metrics

Now we are ready to visualize key metrics on a dashboard.

Cube Marts Cube Marts