/dbt-metalog

Create customizable models from your metadata.

MIT LicenseMIT

πŸ““ dbt-metalog: Your metadata's catalog

Have you ever found yourself jotting down essential business rules, questions, technical owners, or ToDos in a separate Excel sheet β€Š-β€Šfar from your code -, only to forget about them later?

Do you struggle to keep track of all the metadata that's critical for effective data management and analysis?

Worry no more! We have a solutionβ€Š-β€Šdbt-metalog.

Create lightweight and totally customizable models from your metadata.

Easily create models for:

βœ… Business rules.

βœ… Business questions.

βœ… Tech owners.

βœ… Requesting areas/persons.

βœ… Date the model was created.

βœ… ToDo's.

βœ… Any metadata you want...

Choose your metadata by:

βœ… name.

βœ… resource type.

βœ… file.

πŸ”Ž Content

πŸƒ Quickstart

New to dbt packages? Read more about them here.

Requirements

dbt version

  • dbt version >= 1.0.0

Installation

  1. Include this package in your packages.yml file.
packages:
  - package: techindicium/metalog
    version: 1.0.0
  1. Run dbt deps to install the package.

Package Limitations

Warning If your project is too large (too many models with too many metadata), there is a chance the generated SQL by the macros exceed the query length limit of your DW. Then you will get an error.

βš™οΈ Macros

create_metadata_model (source)

This macro generates SQL for creating customizable tables or views from the metadata of your nodes and sources. You have the flexibility to select the specific metadata you want to include in your table or view. If a node does not contain the specified metadata, it will be displayed as "Undefined", but you can alter this default text to your preference.

Note For this README, every time you see "node" consider also "sources"

The macro will check the metadata defined in your nodes. If you are new to metadata in dbt, check the documentation here.

Nodes can be:

  • Models
  • Sources
  • Seeds
  • Snapshots
  • Tests
  • Analyses
  • Macros

For a model, you can define the metaconfig in

  • Your model, inside the config block.
  • A config property, in a .yml file
  • The dbt_project.yml under configs under models.

For others resource types, check the docs.

Warning: Currently this package does not supports dicts in the meta config, just single values or lists.

Arguments

  • metadata (required): A list of the metadata which will be the columns of your model.
  • granularity (optional) (default = []): A list of th metadata which must be separated in different rows. They must be wrote in the meta config as lists.
  • resource_type (optional) (default = ['model']): A list of the resource types you want to read the metadata from. Options:
    • model
    • source
    • seed
    • snapshot
    • tests
    • analysis
    • macros
  • undefined (optional) (default = 'Undefined'): A string which overrides the default string shown when the metadata is not found for that model.
  • undefined_as_null (optional) (default = 'False'): A booelan, when True undefined metadata will be displayed as null.
  • show_resource_type(optional) (default = True): A boolean to show or hide the resource_type column in your resulting model.
  • files(optional) (default = []): A list of regex specifying the files to include, e.g. If you want to include all files in models, then files=['models/.*'].
  • exclude_files(optional) (default = []): A list of regex specifying the files to exclude, e.g. If you want to exclude all staging files, then files=['.*stg_.*'].

Usage

Define the metadata in your nodes

So, for example take a look at the dummy_model_1 inside the integration_tests folder

{{ config(
    meta={
        'main_subject': 'sales'
        , 'owner': 'Alice'
        , 'business_questions': [
            'How many stores of type ...?'
            , 'How many stores in ...?'
        ]
        , 'business_rules': [
            'Stores of type A receive code B ...'
            , 'Consider only stores open after ...'
        ]
        , 'todos': 'change to incremental'
    }
)}}

select 1 as dummy

Create a model which uses the create_metadata_model macro.

Use the create_metadata_model macro passing as argument a list of the metadata you want to include in your model. Let's see the 01_metadata_test model (You can create a model with any name you want) as an example:

{{ metalog.create_metadata_model(
        metadata = [
            "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
)}}

Note: The default materialization for dbt models is view. If you want to change to table, change the materialized configuration property .

Run your model

Just run it!

dbt run -s metadata_view

Note Suppose we have the following nodes in our project: dummy_model_1, dummy_model_2 and dummy_seed

The output view, using the meta defined in our nodes will be:

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice "['How many stores of type ...?', 'How many stores in ...?']" "['Stores of type A receive code B ...', 'Consider only stores open after ...']" change to incremental
dummy_model_2 model people Bob How many employees ...? Undefined Undefined

Additional customization

granularity

If you want to break a metadata into different rows, you can use the granularity_list argument. For example, if you want to break the business_questions meta into different lines, change your model to:

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
        ]
)}}

Now you have only a business question per row.

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? "['Stores of type A receive code B ...', 'Consider only stores open after ...']" change to incremental
dummy_model_1 model sales Alice How many stores in ...? "['Stores of type A receive code B ...', 'Consider only stores open after ...']" change to incremental
dummy_model_2 model people Bob How many employees ...? Undefined Undefined

You can pass more than one metadata to the granularity_list, for example:

02_metadata_test_with_granularity

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
)}}

Now each row have a unique business question, business rule and a todo.

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 model people Bob How many employees ...? Undefined Undefined

resource_type

You can ask the macro to include metadata of more resource types with the resource_typeargument. Let's include seeds along with models.

03_metadata_test_resource_type

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
        , resource_type = [
                "model"
                , "seed"
        ]
)}}

Now you can see also the metadata from the dummy_seed.

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 model people Bob How many employees ...? Undefined Undefined
dummy_seed seed sales Carl Undefined Undefined Undefined

show_resource_type

You can hide the resource_type column passing the argument show_resource_type=False

04_metadata_test_show_resource_type

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
        , resource_type = [
                "model"
                , "seed"
        ]
        , show_resource_type = False
)}}

The resource_type column was removed

resource_name main_subject owner business_questions business_rules todos
dummy_model_1 sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 people Bob How many employees ...? Undefined Undefined
dummy_seed sales Carl Undefined Undefined Undefined

undefined

You can override the default 'Undefined' string with undefined.

05_metadata_test_undefined

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
        , resource_type = [
                "model"
                , "seed"
        ]
        , show_resource_type = True
        , undefined = "Not defined"
)}}

The undefined metadata are displayed as 'Not defined'.

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 model people Bob How many employees ...? Not defined Not defined
dummy_seed seed sales Carl Not defined Not defined Not defined

undefined_as_null

You can also set the undefined metadata to appear as null values with undefined_as_null

06_metadata_test_undefined_as_null

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
        , resource_type = [
                "model"
                , "seed"
        ]
        , show_resource_type = True
        , undefined = "Not defined"
        , undefined_as_null = True
)}}

The undefined metadata are displayed as null.

resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 model people Bob How many employees ...?
dummy_seed seed sales Carl

files

You can select the files you want to include (files) or to exclude (exclude_files)

07_metadata_test_files

{{ metalog.create_metadata_model(
        metadata = [
           "main_subject"
            , "owner"
            , "business_questions"
            , "business_rules"
            , "todos"
        ]
        , granularity = [
            "business_questions"
            , "business_rules"
            , "todos"
        ]
        , resource_type = [
                "model"
                , "seed"
                , "source"
        ]
        , show_resource_type = True
        , undefined = "Not defined"
        , undefined_as_null = True
        , files = [
            "models/.*"
            , "seeds/.*"
            , ".*/source.yml"
        ]
        , exclude_files = [
            'models/metadata_tests/.*'
            'models/description_tests/.*'
        ]
)}}
resource_name resource_type main_subject owner business_questions business_rules todos
dummy_model_1 model sales Alice How many stores of type ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores of type ...? Consider only stores open after ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Stores of type A receive code B ... change to incremental
dummy_model_1 model sales Alice How many stores in ...? Consider only stores open after ... change to incremental
dummy_model_2 model people Bob How many employees ...?
dummy_seed seed sales Carl
raw source

create_description_model (source)

This macro generates SQL for creating tables or views from the description of your nodes and its columns. You have the flexibility to select the specific files you want to include in your table or view.

The macro will check the description defined in your nodes.

Nodes can be:

  • Models
  • Sources
  • Seeds
  • Snapshots
  • Tests
  • Analyses
  • Macros

Arguments

  • resource_type (optional) (default = ['model']): A list of the resource types you want to read the metadata from. Options:
    • model
    • source
    • seed
    • snapshot
    • tests
    • analysis
    • macros
  • show_resource_type(optional) (default = True): A boolean to show or hide the resource_type column in your resulting model.
  • files(optional) (default = []): A list of regex specifying the files to include, e.g. If you want to include all files in models, then files=['models/.*'].
  • exclude_files(optional) (default = []): A list of regex specifying the files to exclude, e.g. If you want to exclude all staging files, then files=['.*stg_.*'].

Usage

Define the description of your nodes

So, for example take a look at the dummy_model_2.yml inside the integration_tests folder

version: 2

models:
  - name: 'dummy_model_2'
    description: "description of dummy_model_2"

    columns:
      - name: "dummy"
        description: "the description of the dummy column of dummy_model_2"

Create a model which uses the create_description_model macro.

Use the create_description_model macro passing the arguments of your choice

description_test_files

{{ metalog.create_description_model(
        resource_type = [
            "model"
            , "seed"
            , "source"
        ]
        , show_resource_type = True
        , files = [
            '.*.sql'
        ]
        , exclude_files = [
            'models/metadata.*'
        ]
)}}

Note: The default materialization for dbt models is view. If you want to change to table, change the materialized configuration property .

Run your model

Just run it!

dbt run -s metadata_view

Note Suppose we have the following nodes in our project: dummy_model_1, dummy_model_2 and dummy_seed

The output view, using the meta defined in our nodes will be:

resource_name resource_type resource_description column_name columns_description
dummy_model_1 model dummy the description of the dummy column of dummy_model_1
dummy_model_2 model description of dummy_model_2 dummy the description of the dummy column of dummy_model_2

πŸ”§ Troubleshooting

Unclosed string literal

Database Error in model description_model (models/metadata_catalog/description_model.sql)
Syntax error: Unclosed string literal at [2473:196]
compiled Code at target/run/marketing_mas/models/metadata_catalog/description_model.sql

Check if there is a double quote in any of your descriptions. If so, remove it or replace by single quotes.

The query is too large

Database Error in model description models_ view (models/metadata catalog/description models_ view.sql)
The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.
compiled Code at target/run/marketing_mas/models/metadata_catalog/description_models_view.sql

It is a current limitation of the package. As it passes the metadata/descriptions to the SQL query, if there is a massive number of metadata/descriptions there is a chance the query exceeds the limits of your DW.

✍️ ToDos

  • Implement CI
  • Create PR template
  • Workaround the query limit problem