/dbt-cue

Generate dbt yml files using the CUE language

Primary LanguageCUE

dbt-cue

Generate dbt yml files using the CUE language

About

I personally don't love YAML and find it error prone. I use the CUE language to generate and validate all YAML files for just about everything I do at this point (Kubernetes, config, etc.) This repo demonstrates using CUE to generate dbt YAML files. There are a lot more checks to be added and you can easily clone and add your own validations as well, but this is a start.

Minimum Requirements

Getting Started

The schema.cue file defines the types and values that are used to build the dbt yaml files. For now, I've only worked through building the Source yaml file. I will add more later. The schema.cue file is for shared type and value definitions only and should not be edited.

User input goes into a src_ file (though the name doesn't matter), where the various objects are built.

sources: #sourceList & [_jaffleShop]

_jaffleShop: #Source & {
  name:        "jaffle_shop"
  description: "Jaffle Shop is the testing source for dbt"
  database:    "dbt-tutorial"
  schema:      "jaffle_shop"
  meta: [{"contains_pii": true, "owner": "@alice"}]
  tags: ["ecom", "pii"]
  quoting: #Quoting & {
    database:   false
    schema:     false
    identifier: false
  }
  tables: [_jaffleOrderTable, _jaffleCustomerTable]
}

_jaffleOrderTable: #Table & {
  name:            "orders"
  identifier:      "Orders_"
  loaded_at_field: "updated_at"
  columns: [
    #Column & {
      name: "id"
      tests: ["unique", "not_null"]
    },
    #Column & {
      name: "status"
      tests: [#AcceptedValues & {
        accepted_values: {
          values: ["placed", "shipped", "completed", "returned"]
        }
      }]
    },
    #Column & {
      name: "price_in_usd"
      tests: ["not_null"]
    },
    #Column & {
      name: "customer_id"
      tests: [
        #Relationship & {
          relationships: {
            to:    "ref('customers')"
            field: "id"
          }
        },
      ]
    },
  ]
}

_jaffleCustomerTable: #Table & {
  name:    "customers"
  quoting: #Quoting & {
    identifier: true
  }
}

Creating CUE values reads well as it is a superset of JSON. Here we are creating a _jaffleShop value, which is a dbt Source (see schema.cue) for the #Source definition. We can add whatever key value pairs we like as meta, any list of strings for tags, etc. For more complex structures like a table or quoting, we use CUE definitions, there are examples here of creating them inline as well as separate entities.

File Generation

The CUE cli is used to vet (validate the data against the schema), fmt (format - which nicely standardizes the input files) and finally export the data to the named file. CUE works with the concept of packages, so you can merge multiple files together. As noted before, I put my general types and values in the schema.cue file and implement those types and values in a separate file (in this case src_jaffleshop.cue).

cue vet schema.cue src_jaffleshop.cue
cue fmt schema.cue src_jaffleshop.cue
cue export schema.cue src_jaffleshop.cue --force --out yaml --outfile _jaffle_shop__sources.yml

For each command (vet, fmt, export) I am passing the list of files to be used when generating the output. --out is what type of output (yaml, json, etc.) The --outfile is what we want to name our output file.

I provided 2 different sample bash scripts (run_jaffleshop.sh and run_stripe.sh) that you can use to generate output or, if you prefer, you can input the cli commands yourself.

Output

The below output is from CUE, you'll find it basically a match (minus spaces) of the Source example. I have added some additional examples beyond the example to demonstrate things like using accepted_values tests, etc.

version: 2
sources:
  - name: jaffle_shop
    description: Jaffle Shop is the testing source for dbt
    database: dbt-tutorial
    schema: jaffle_shop
    meta:
      - contains_pii: true
        owner: '@alice'
    tags:
      - ecom
      - pii
    quoting:
      database: false
      schema: false
      identifier: false
    tables:
      - name: orders
        identifier: Orders_
        loaded_at_field: updated_at
        columns:
          - name: id
            tests:
              - unique
              - not_null
          - name: status
            tests:
              - accepted_values:
                  values:
                    - placed
                    - shipped
                    - completed
                    - returned
          - name: price_in_usd
            tests:
              - not_null
          - name: customer_id
            tests:
              - relationships:
                  to: ref('customers')
                  field: id
      - name: customers
        quoting:
          identifier: true