Merge Strategy + on_schema_change seem to be fixed
Closed this issue · 10 comments
Expected behavior
My config doesn't state an on_schema_change or incremental_strategy so the defaults of 'ignore' and 'append' should be applied
Actual behavior
__dbt_alter column is being added to the target table despite the column already existing and then an update is attempted but fails.
Steps To Reproduce
The only non-standard setup item is an override macro for inserts as we have pre-deployed wide tables that we don't want to remove or specify each column within the model. This macro (posted below) just inserts into the fields that exist within the model. This doesn't explain why columns are being added to the table though.
{#
NOTE: This is for dbt-trino version >= 1.2.1
Override macro to enable inserts into destination table columns where the name matches the
source model columns.
This is required as the CDM tables often have lots of columns that wont be available from
client data models.
Change summary:
1. swapped adapter.get_columns_in_relation(target_relation) to adapter.get_columns_in_relation(tmp_relation)
2. added a field list to the insert statement to stop ordinal matching
#}
{% macro get_append_sql(target_relation, tmp_relation, dest_columns) %}
{%- set dest_columns = adapter.get_columns_in_relation(tmp_relation) -%}
{%- set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') -%}
insert into {{ target_relation }} ({{ dest_cols_csv }})
select {{dest_cols_csv}} from {{ tmp_relation.include(database=true, schema=true) }};
drop table if exists {{ tmp_relation }};
{% endmacro %}
Log output/Screenshots
10:01:24 TrinoUserError(type=USER_ERROR, name=PERMISSION_DENIED, message="Access Denied: Cannot update columns [added_by__dbt_alter] in table sob_output.a_item", query_id=20221031_100124_00119_4uxjj)
Operating System
20.04 LTS (Focal Fossa)
dbt version
1.3.0
Trino Server version
393-e.1
Python version
Python 3.7.6
Are you willing to submit PR?
- Yes I am willing to submit a PR!
I'm not able to reproduce the issue, is your model configuration just materialized='incremental'
as below? Please post your dbt_project
config as well.
{{
config(
materialized='incremental'
)
}}
Is the issue exists when you don't override the macro?
That model is definitely set to incremental in my dbt_project.yml
.
output: # F Tables
+materialized: incremental
+schema: output
+database: minio_presto_ail
I'll attempt it without the macro but it's worked absolutely fine up to 1.3?
It would be nice if you can test it without the macro. Note that there was large refactoring of incremental materialization in dbt_core.
I'll test it this morning. I actually think this is an issue caused by dbt-core, not dbt-trino.
Renaming the macro so it doesn't take effect makes no difference.
Ok... narrowed it down a bit! It's trying to alter a column because the varchar lengths are different. The target column has a data type of varchar(40) and current_user (which is being used to insert into that field) returns a varchar(256). I've fixed this by casting current_user to varchar(40) but should it be attempting this when on_schema_change is defaulting to ignore?
============================== 2022-11-03 09:31:43.562607 | ec321820-d10c-4ee8-943d-05847551cfab ==============================
�[0m09:31:43.562645 [info ] [MainThread]: Running with dbt=1.3.0
�[0m09:31:43.563414 [debug] [MainThread]: running dbt with arguments {'write_json': True, 'use_colors': True, 'printer_width': 80, 'version_check': True, 'partial_parse': True, 'static_parser': True, 'profiles_dir': '/home/ail_aeveri01/.dbt', 'send_anonymous_usage_stats': False, 'event_buffer_size': 100000, 'quiet': False, 'no_print': False, 'select': ['a_company'], 'which': 'run', 'rpc_method': 'run', 'indirect_selection': 'eager'}
�[0m09:31:43.563592 [debug] [MainThread]: Tracking: do not track
�[0m09:31:43.846109 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
�[0m09:31:43.847513 [debug] [MainThread]: Partial parsing: updated file: sob_cdm://macros/trino__get_delete_insert_merge_sql.sql
�[0m09:31:43.847751 [debug] [MainThread]: Parsing macros/trino__get_delete_insert_merge_sql.sql
�[0m09:31:43.849696 [debug] [MainThread]: Parsing macros/materializations/models/incremental/merge.sql
�[0m09:31:43.864348 [debug] [MainThread]: Parsing macros/materializations/models/incremental/strategies.sql
�[0m09:31:43.971694 [warn ] [MainThread]: [�[33mWARNING�[0m]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- seeds.sob_cdm.customer_groups
�[0m09:31:44.152702 [info ] [MainThread]: Found 114 models, 131 tests, 0 snapshots, 4 analyses, 730 macros, 0 operations, 3 seed files, 119 sources, 0 exposures, 0 metrics
�[0m09:31:44.160041 [info ] [MainThread]:
�[0m09:31:44.160776 [debug] [MainThread]: Acquiring new trino connection "master"
�[0m09:31:44.162403 [debug] [ThreadPool]: Acquiring new trino connection "list_minio_presto_ail"
�[0m09:31:44.185168 [debug] [ThreadPool]: Using trino connection "list_minio_presto_ail"
�[0m09:31:44.185364 [debug] [ThreadPool]: On list_minio_presto_ail: select distinct schema_name
from minio_presto_ail.INFORMATION_SCHEMA.schemata
�[0m09:31:44.185539 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m09:31:44.298648 [debug] [ThreadPool]: SQL status: SUCCESS in 0.11 seconds
�[0m09:31:44.301750 [debug] [ThreadPool]: On list_minio_presto_ail: Close
�[0m09:31:44.316957 [debug] [ThreadPool]: Acquiring new trino connection "list_minio_presto_ail_sob_temp"
�[0m09:31:44.334297 [debug] [ThreadPool]: Using trino connection "list_minio_presto_ail_sob_temp"
�[0m09:31:44.334489 [debug] [ThreadPool]: On list_minio_presto_ail_sob_temp: select
table_catalog as database,
table_name as name,
table_schema as schema,
case when table_type = 'BASE TABLE' then 'table'
when table_type = 'VIEW' then 'view'
else table_type
end as table_type
from minio_presto_ail.INFORMATION_SCHEMA.tables
where table_schema = 'sob_temp'
�[0m09:31:44.334638 [debug] [ThreadPool]: Opening a new connection, currently in state closed
�[0m09:31:44.443793 [debug] [ThreadPool]: SQL status: SUCCESS in 0.11 seconds
�[0m09:31:44.446635 [debug] [ThreadPool]: On list_minio_presto_ail_sob_temp: ROLLBACK
�[0m09:31:44.447056 [debug] [ThreadPool]: On list_minio_presto_ail_sob_temp: Close
�[0m09:31:44.448914 [debug] [ThreadPool]: Acquiring new trino connection "list_minio_presto_ail_sob_scope"
�[0m09:31:44.456283 [debug] [ThreadPool]: Using trino connection "list_minio_presto_ail_sob_scope"
�[0m09:31:44.456525 [debug] [ThreadPool]: On list_minio_presto_ail_sob_scope: select
table_catalog as database,
table_name as name,
table_schema as schema,
case when table_type = 'BASE TABLE' then 'table'
when table_type = 'VIEW' then 'view'
else table_type
end as table_type
from minio_presto_ail.INFORMATION_SCHEMA.tables
where table_schema = 'sob_scope'
�[0m09:31:44.456699 [debug] [ThreadPool]: Opening a new connection, currently in state closed
�[0m09:31:44.534195 [debug] [ThreadPool]: SQL status: SUCCESS in 0.08 seconds
�[0m09:31:44.539241 [debug] [ThreadPool]: On list_minio_presto_ail_sob_scope: ROLLBACK
�[0m09:31:44.539544 [debug] [ThreadPool]: On list_minio_presto_ail_sob_scope: Close
�[0m09:31:44.540935 [debug] [ThreadPool]: Acquiring new trino connection "list_minio_presto_ail_sob_output"
�[0m09:31:44.546075 [debug] [ThreadPool]: Using trino connection "list_minio_presto_ail_sob_output"
�[0m09:31:44.546285 [debug] [ThreadPool]: On list_minio_presto_ail_sob_output: select
table_catalog as database,
table_name as name,
table_schema as schema,
case when table_type = 'BASE TABLE' then 'table'
when table_type = 'VIEW' then 'view'
else table_type
end as table_type
from minio_presto_ail.INFORMATION_SCHEMA.tables
where table_schema = 'sob_output'
�[0m09:31:44.551260 [debug] [ThreadPool]: Opening a new connection, currently in state closed
�[0m09:31:44.660192 [debug] [ThreadPool]: SQL status: SUCCESS in 0.11 seconds
�[0m09:31:44.664570 [debug] [ThreadPool]: On list_minio_presto_ail_sob_output: ROLLBACK
�[0m09:31:44.665023 [debug] [ThreadPool]: On list_minio_presto_ail_sob_output: Close
�[0m09:31:44.668212 [debug] [MainThread]: On master: COMMIT
�[0m09:31:44.669523 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
�[0m09:31:44.670292 [info ] [MainThread]:
�[0m09:31:44.674328 [debug] [Thread-1 ]: Began running node model.sob_cdm.a_company
�[0m09:31:44.674789 [info ] [Thread-1 ]: 1 of 1 START sql incremental model sob_output.a_company ........................ [RUN]
�[0m09:31:44.675735 [debug] [Thread-1 ]: Acquiring new trino connection "model.sob_cdm.a_company"
�[0m09:31:44.676109 [debug] [Thread-1 ]: Began compiling node model.sob_cdm.a_company
�[0m09:31:44.676470 [debug] [Thread-1 ]: Compiling model.sob_cdm.a_company
�[0m09:31:44.704855 [debug] [Thread-1 ]: Writing injected SQL for node "model.sob_cdm.a_company"
�[0m09:31:44.705363 [debug] [Thread-1 ]: finished collecting timing info
�[0m09:31:44.705566 [debug] [Thread-1 ]: Began executing node model.sob_cdm.a_company
�[0m09:31:44.751856 [debug] [Thread-1 ]: Using trino connection "model.sob_cdm.a_company"
�[0m09:31:44.752035 [debug] [Thread-1 ]: On model.sob_cdm.a_company: create table minio_presto_ail.sob_output.a_company__dbt_tmp
as (
with nat_region as (
select
*
, 1 as ledger_id
from
minio_presto_ail.sob_scope.m_nat_region
),
final as (
select
sha256(to_utf8(coalesce(cast(ledger_id as varchar), '') || ':' || coalesce(cast(co_code as varchar), '') )) as company_key
, sha256(to_utf8(coalesce(cast(ledger_id as varchar), '') )) as ledger_key
, co_code as company_reference
, region_name as company_description
, 'Canada' as country
, 'CAD' as local_currency
, '$' as currency_symbol
, cast(current_timestamp as timestamp(3)) as added_on
, current_user as added_by
, cast(current_timestamp as timestamp(3)) as changed_on
, current_user as changed_by
from
nat_region
)
select
*
from
final
where
not exists (
select
1
from
minio_presto_ail.sob_output.a_company ref
where
ref.company_key = final.company_key
)
)
�[0m09:31:44.752183 [debug] [Thread-1 ]: Opening a new connection, currently in state closed
�[0m09:31:45.171514 [debug] [Thread-1 ]: SQL status: SUCCESS in 0.42 seconds
�[0m09:31:45.191706 [debug] [Thread-1 ]: Using trino connection "model.sob_cdm.a_company"
�[0m09:31:45.191905 [debug] [Thread-1 ]: On model.sob_cdm.a_company: describe minio_presto_ail.sob_output.a_company__dbt_tmp
�[0m09:31:45.334303 [debug] [Thread-1 ]: SQL status: SUCCESS in 0.14 seconds
�[0m09:31:45.343735 [debug] [Thread-1 ]: Using trino connection "model.sob_cdm.a_company"
�[0m09:31:45.343977 [debug] [Thread-1 ]: On model.sob_cdm.a_company: describe minio_presto_ail.sob_output.a_company
�[0m09:31:45.453364 [debug] [Thread-1 ]: SQL status: SUCCESS in 0.11 seconds
�[0m09:31:45.456082 [debug] [Thread-1 ]: Changing col type from varchar(40) to varchar(256) in table _ReferenceKey(database='minio_presto_ail', schema='sob_output', identifier='a_company')
�[0m09:31:45.468574 [debug] [Thread-1 ]: Using trino connection "model.sob_cdm.a_company"
�[0m09:31:45.468776 [debug] [Thread-1 ]: On model.sob_cdm.a_company: alter table minio_presto_ail.sob_output.a_company add column "added_by__dbt_alter" varchar(256)
�[0m09:31:45.538235 [debug] [Thread-1 ]: SQL status: SUCCESS in 0.07 seconds
�[0m09:31:45.538688 [debug] [Thread-1 ]: Using trino connection "model.sob_cdm.a_company"
�[0m09:31:45.539015 [debug] [Thread-1 ]: On model.sob_cdm.a_company: update minio_presto_ail.sob_output.a_company set "added_by__dbt_alter" = "added_by"
�[0m09:31:45.587119 [debug] [Thread-1 ]: Trino adapter: Trino query id: 20221103_093145_00163_4uxjj
�[0m09:31:45.587518 [debug] [Thread-1 ]: Trino adapter: Trino error: TrinoUserError(type=USER_ERROR, name=PERMISSION_DENIED, message="Access Denied: Cannot update columns [added_by__dbt_alter] in table sob_output.a_company", query_id=20221103_093145_00163_4uxjj)
�[0m09:31:45.588166 [debug] [Thread-1 ]: finished collecting timing info
�[0m09:31:45.588569 [debug] [Thread-1 ]: On model.sob_cdm.a_company: ROLLBACK
�[0m09:31:45.588882 [debug] [Thread-1 ]: On model.sob_cdm.a_company: Close
�[0m09:31:45.589974 [debug] [Thread-1 ]: Database Error in model a_company (models/output/a_company.sql)
TrinoUserError(type=USER_ERROR, name=PERMISSION_DENIED, message="Access Denied: Cannot update columns [added_by__dbt_alter] in table sob_output.a_company", query_id=20221103_093145_00163_4uxjj)
�[0m09:31:45.590723 [error] [Thread-1 ]: 1 of 1 ERROR creating sql incremental model sob_output.a_company ............... [�[31mERROR�[0m in 0.92s]
�[0m09:31:45.591755 [debug] [Thread-1 ]: Finished running node model.sob_cdm.a_company
�[0m09:31:45.681360 [debug] [MainThread]: Acquiring new trino connection "master"
�[0m09:31:45.681993 [debug] [MainThread]: On master: COMMIT
�[0m09:31:45.683182 [info ] [MainThread]:
�[0m09:31:45.683916 [info ] [MainThread]: Finished running 1 incremental model in 0 hours 0 minutes and 1.52 seconds (1.52s).
�[0m09:31:45.684589 [debug] [MainThread]: Connection 'master' was properly closed.
�[0m09:31:45.684927 [debug] [MainThread]: Connection 'model.sob_cdm.a_company' was properly closed.
�[0m09:31:45.821278 [info ] [MainThread]:
�[0m09:31:45.821639 [info ] [MainThread]: �[31mCompleted with 1 error and 0 warnings:�[0m
�[0m09:31:45.821919 [info ] [MainThread]:
�[0m09:31:45.822150 [error] [MainThread]: �[33mDatabase Error in model a_company (models/output/a_company.sql)�[0m
�[0m09:31:45.822379 [error] [MainThread]: TrinoUserError(type=USER_ERROR, name=PERMISSION_DENIED, message="Access Denied: Cannot update columns [added_by__dbt_alter] in table sob_output.a_company", query_id=20221103_093145_00163_4uxjj)
�[0m09:31:45.822602 [info ] [MainThread]:
�[0m09:31:45.822825 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
�[0m09:31:45.823126 [debug] [MainThread]: Flushing usage events
The alter table is not coming from the on_schema_change
. It is triggered through the following code
{% do adapter.expand_target_column_types(
from_relation=tmp_relation,
to_relation=target_relation) %}
because the inserted data has a different precision from the existing data as you found out yourself Changing col type from varchar(40) to varchar(256)
.
Which connector are you using? It seems that it doesn't allow updates?
It's just a standard hive connector, we're going to be migrating to iceberg early next year
Ok, so in your case what you need to do is ensure the precision of the columns doesn't change over incremental invocations.
We can continue the conversation on Slack but closing the issue as it works as expected.