on_schema_change config doesn't have effect

Question

on_schema_change config doesn't have effect

samulima opened this issue 2 years ago · 1 comments

Describe the bug

Having on_schema_change configuration as 'append_new_columns' or 'sync_all_columns' doesn't append incremental table with new columns.
Feature was released in dbt version 0.21

Steps To Reproduce

Create model test.sql

{{
    config(
        materialized='incremental',
        on_schema_change='append_new_columns'
        )
}}
with source_data as (
    select 1 as id
)
select *
from source_data

Run model dbt run -s test
Find new table with column id created
Change model test.sql to

{{
    config(
        materialized='incremental',
        on_schema_change='append_new_columns'
        )
}}
with source_data as (
    select 1 as id, 2 as new_id
)
select *
from source_data

Run again model dbt run -s test
Find table including only column id

Expected behavior

Latter run should have added new column new_id to test table

The output of dbt --version:
Same result with both of versions

installed version: 1.0.1
Plugins:  
- teradata: 1.0.1.1

And newest version of adapter (bugged output)

07:37:03  Encountered an error:
Could not parse version "1.0.6.0"

The operating system you're using:
Windows 10
The output of python --version:
Python 3.9.13 with dbt-teradata version 1.0.6.0
Python 3.9.11 with dbt-teradata version 1.0.1.1

Answer 1 · 2022-11-10T17:50:28.000Z

@samulima , thanks for taking the time to report the issue. I'm going to capture what I've been able to learn, mainly for my future self and the other Teradata team members. If you would like to take a stab at it and try to improve the connector, we will very much appreciate your contribution.

The core logic for the incremental materialization resides in dbt/include/teradata/macros/materializations/incremental/incremental.sql. When a relationship is not a view and a full refresh has not been requested, the following part of the template will handle the increment:

   {% set tmp_relation = make_temp_relation(target_relation) %}
   {% do run_query(create_table_as(True, tmp_relation, sql)) %}
   {% do adapter.expand_target_column_types(
          from_relation=tmp_relation,
          to_relation=target_relation) %}
   {% set build_sql = incremental_upsert(tmp_relation, target_relation, unique_key=unique_key) %}
   {% do to_drop.append(tmp_relation) %}

This code creates a temporary table and populates it with the data as defined in the model SQL. It then calls incremental_upsert function. The function is implemented in dbt/include/teradata/macros/materializations/incremental/helpers.sql:

    {%- if unique_key is not none -%}
    DELETE
    FROM {{ target_relation }}
    WHERE ({{ unique_key }}) IN (
        SELECT ({{ unique_key }})
        FROM {{ tmp_relation }}
    );
    {%- endif %}

    INSERT INTO {{ target_relation }} ({{ dest_cols_csv }})
       SELECT {{ dest_cols_csv }}
       FROM {{ tmp_relation }}
    ;

This SQL removes old values (if there is a unique key) from the target table and inserts any new values from the source table. There are no DDL statements here.

The temporary source relationship likely already includes the additional column. The source is based purely on the model and doesn't "look" at the target relationship. Since we never DDL the target table, it doesn't absorb the values from the newly added columns that appear in the source table.

The solution is likely to check for on_schema_change='append_new_columns', calculate the delta on columns and add the missing columns to the target table. Once this is done, proceed with the current incremental upsert logic.