starburstdata/dbt-trino

dbt_utils deduplicate macro not working with trino

Closed this issue · 5 comments

Expected behavior

dbt_utils.deduplicate uses a natural join, which is not supported in trino. Other dbs have their own custom deduplicate macro, e.g. snowflake__deduplicate.
I think the solution would be to have a separate trino__deduplicate in dbt_utils.

Actual behavior

The query errors

Steps To Reproduce

Use dbt_utils.deduplicate on any relation.

select *
from ({{ dbt_utils.deduplicate(
    relation='some_relation',
    partition_by="some_partitioning_column",
    order_by="some_ordering_column desc"
    ) }})

Log output/Screenshots

Database Error TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="line 33:10: Natural join not supported", query_id=20230103_090105_00691_3kc63)

Operating System

Ubuntu 22.04

dbt version

1.3.1

Trino Server version

402-e.0

Python version

3.10.6

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Is this the right repository to post this issue, since it's trino-specific? Or does this belong in dbt_utils?

It's already supported in dbt-trino-utils https://github.com/starburstdata/dbt-trino-utils/blob/main/macros/dbt_utils/sql/deduplicate.sql

Please let us know your feedback about the package.

Thanks a lot! I didn't know that package existed.
I had to restart vscode for the 'dbt power user' extension to work again, but now it works like a charm.

I did

{% set target_table = ref("original_table") %}

...
deduped_result_with_priority_and_rn as (...)

{% set cols = get_columns_in_relation(target_table) %}
select {{ cols | map(attribute='column') | join(', ') }} from deduped_result_with_priority_and_rn

So that the schema is the same as the original. I'm not sure if that can be incorporated in the macro.

@RobbertDM unfortunately it can't. This macro removes duplicates from models, sources, and CTEs (CTE - third example in docs). The issue is related to CTE, as we can't retrieve list of columns from CTE in advance, as we can with models and sources. We decided to support all 3 types of objects on which you can use this macro as in dbt-utils, at a cost of leaking rn column. It will be fixed as soon as Trino would add support to natural join.

@damian3031 Ah yes, that's also why I couldn't use dbt_utils.star(from=cte, except=rn): star does not seem to support CTEs, only models and sources.

Cool! Looking forward to it.