ministryofjustice/data-catalogue

Lineage from CaDeT

MatMoore opened this issue · 4 comments

CaDeT can provide full lineage from source to derived table, but we don't have this working at the moment because we are only including a select list of tables.

This is controlled by the node_name_pattern which can have either an include list or an exclude list.

There is some low value tables that are created as part of the implementation of create_a_derived_table. Is there anyway we can exclude intermediate tables without losing the lineage either side of them?

Alternatively, can we ingest everything, but tag assets in such a way that we can filter out these intermediate tables from the search in find-moj-data?

Desired outcome of spike

  • A working recipe we can use for dbt going forwards
  • We are still able to hide intermediate tables from end users looking for data to use

Note: I briefly tried this node name pattern, but it seemed like it ruined the lineage

        node_name_pattern:
            deny:
                - '.*intermediate'
                - '.*_intm_'
                - '.*_joined'
                - '.*_filtered'
                - '.*_stg_'
                - '.*_sensitive_'

There is a meeting with DMET re. CaDeT 14:15 Thursday 09/05. If you pick this spike up and are not on the invite ask me and i'll forward

The CaDeT ingestion is planned to work from a flag/tag in the DBT data that indicates whether an asset should be catalogued. This will be decided and manually set by the CaDeT user.

Working assumption is that we will ingest everything, including the intermediate tables, but we only want to display the end products in the front end. How will this work?