support merge_update_columns logic for unit_tests
rburke45 opened this issue · 2 comments
Is this a new bug in dbt-core?
- I believe this is a new bug in dbt-core
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When a model with set merge_update_columns
is unit tested all fields are updated, not just those listed in merge_update_columns
.
Expected Behavior
When a model with set merge_update_columns
is unit tested only the fields listed in merge_update_columns
should be updated.
Steps To Reproduce
Minimal example:
table.sql
{{
config(
materialized='table'
)
}}
SELECT
1 AS col1,
1 AS col2,
8 AS col3
UNION ALL
SELECT
2 AS col1,
1 AS col2,
9 AS col3
UNION ALL
SELECT
3 AS col1,
1 AS col2,
5 AS col3
update.sql
{{
config(
materialized='incremental',
unique_key='col1',
merge_update_columns=['col2'],
)
}}
SELECT
col1,
col2,
col3
FROM
{{ref('table')}}
-- this filter will only be applied on an incremental run
{% if is_incremental() %}
WHERE
col2 > (SELECT MAX(col2) FROM {{ this }})
{% endif %}
unit_test.yml
unit_tests:
- name: merge_update_test
model: update
overrides:
macros:
# unit test this model in "incremental" mode
is_incremental: true
given:
- input: ref('table')
rows:
- {col1: 1, col2: 2, col3: 3}
- {col1: 2, col2: 3, col3: 4}
- input: this
rows:
- {col1: 1, col2: 1, col3: 8}
- {col1: 2, col2: 1, col3: 9}
- {col1: 3, col2: 1, col3: 5}
expect:
rows:
- {col1: 1, col2: 2, col3: 8}
- {col1: 2, col2: 3, col3: 9}
Running this unit test gives the log output below. Calling dbt run
to replicate the same data produces the expected behavior.
Relevant log output
actual differs from expected:
@@ ,col1,col2,col3
→ ,1 ,2 ,8→3
→ ,2 ,3 ,9→4
Environment
- OS: macOS 14.4.1 (23E224)
- Python: Python 3.12.2
- dbt-core: 1.8.0b1
Which database adapter are you using with dbt?
No response
Additional Context
I'm still relatively new to DBT, so please let me know if there's additional information I can provide.
@rburke45 Thanks for opening the issue (and for trying out dbt unit testing)!
In my opinion, this is a fair and reasonable limitation to unit tests and the merge_update_columns
/ merge_update_exclude_columns
configurations. When is_incremental: true
, the unit test is not validating the actual final result of the incremental update — it's validating the set of new rows & columns that the model's SQL is returning, which will then be applied within the materialization. (For this same reason, your unit test expect
doesn't include the col1: 3
row, either, even though that row would remain in {{ this }}
after the incremental update.)
I believe that an important principle of dbt is the separation of "transformation logic" and "materialization logic." As a general rule, dbt models should apply all transformation logic within their select
statements, and the goal of the materialization should be to take the exact returned dataset and apply it to an object in the database (whether via simple create
/replace
or more complex merge
/delete+insert
). The merge_update_columns
goes against this principle by saying, "There's one more piece of transformation logic to apply, during the materialization itself."
While that makes it more ergonomic to apply this trickier logic, it also makes it more difficult to preview or test the exact returned dataset before it's applied. Effectively, you can test this already in dbt, with a mocked (seed) input, a full model build, and an equality
data test against the final output — but this is testing more than just the "unit" of your model's select
statement.
Here's another way of accomplishing the same result — "on incremental runs, update col2
only" — but this time with that logic baked into the select
statement itself:
{{
config(
materialized='incremental',
unique_key='col1',
)
}}
select
{{ 'old.col1' if is_incremental() else 'new.col1' }},
new.col2,
{{ 'old.col3' if is_incremental() else 'new.col3' }},
from
{{ref('table')}} as new
-- this filter will only be applied on an incremental run
{% if is_incremental() %}
left join {{ this }} as old
on new.col1 = old.col1
where new.col2 > (select max(col2) from {{ this }})
{% endif %}
This SQL is trickier to read, but I do believe it's simpler to test, and more explicit as to what's happening: on incremental runs, col2
gets updated, but col1
+ col3
keep their old values. This model definition passes the unit test in your example when I run it locally. You can also preview the exact results you're going to get (dbt show -s update
) before you modify the existing table.
@MichelleArk has alerted me to the fact that this issue is quite similar to another one we'd opened:
This is blocked on introducing a different strategy for unit tests, where all input fixtures + expected + actual are materialized in the data warehouse: