dbt_utils deduplicate macro not working with trino
Closed this issue · 5 comments
Expected behavior
dbt_utils.deduplicate
uses a natural join
, which is not supported in trino. Other dbs have their own custom deduplicate macro, e.g. snowflake__deduplicate
.
I think the solution would be to have a separate trino__deduplicate
in dbt_utils.
Actual behavior
The query errors
Steps To Reproduce
Use dbt_utils.deduplicate
on any relation.
select *
from ({{ dbt_utils.deduplicate(
relation='some_relation',
partition_by="some_partitioning_column",
order_by="some_ordering_column desc"
) }})
Log output/Screenshots
Database Error TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="line 33:10: Natural join not supported", query_id=20230103_090105_00691_3kc63)
Operating System
Ubuntu 22.04
dbt version
1.3.1
Trino Server version
402-e.0
Python version
3.10.6
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Is this the right repository to post this issue, since it's trino-specific? Or does this belong in dbt_utils
?
It's already supported in dbt-trino-utils
https://github.com/starburstdata/dbt-trino-utils/blob/main/macros/dbt_utils/sql/deduplicate.sql
Please let us know your feedback about the package.
Thanks a lot! I didn't know that package existed.
I had to restart vscode for the 'dbt power user' extension to work again, but now it works like a charm.
I did
{% set target_table = ref("original_table") %}
...
deduped_result_with_priority_and_rn as (...)
{% set cols = get_columns_in_relation(target_table) %}
select {{ cols | map(attribute='column') | join(', ') }} from deduped_result_with_priority_and_rn
So that the schema is the same as the original. I'm not sure if that can be incorporated in the macro.
@RobbertDM unfortunately it can't. This macro removes duplicates from models, sources, and CTEs (CTE - third example in docs). The issue is related to CTE, as we can't retrieve list of columns from CTE in advance, as we can with models and sources. We decided to support all 3 types of objects on which you can use this macro as in dbt-utils
, at a cost of leaking rn
column. It will be fixed as soon as Trino would add support to natural join
.
@damian3031 Ah yes, that's also why I couldn't use dbt_utils.star(from=cte, except=rn)
: star does not seem to support CTEs, only models and sources.
Cool! Looking forward to it.