starburstdata/dbt-trino

Incorrect Schema Used When Renaming Materialized Views

wjhrdy opened this issue · 4 comments

wjhrdy commented

Expected behavior

When renaming a materialized view to a backup version, the schema of the original materialized view should be retained. The backup version should be created in the same schema as the original materialized view.

Actual behavior

Currently, when renaming a materialized view to a backup version, the schema of the backup version is dropped. This results in the backup version being created in the default schema of the connection, instead of the schema of the original materialized view. This discrepancy causes an error when dbt checks for the backup materialized view to drop if it already exists, as it checks in the original schema, not finding the backup version that was created in the default schema.

Steps To Reproduce

  1. Create a view in a non-default schema using dbt-trino.
  2. Attempt to create a materialized view of the view just created by adding this to the view:
    {{
        config(
            materialized='materialized_view'
        )
    }}
    
  3. Observe that the backup version is created in the default schema of the connection, not the original schema of the materialized view and not cleaned up.
  4. Unmaterialize the view by removing the snippet. Run.
  5. Observe that the backup version is not cleaned up still.
  6. Rematerialize the view by adding the snippet. Run.
  7. When dbt tries to create the backup it already exists because it wasn't cleaned up, and creates an error.

Log output/Screenshots

Database Error in model int_table (intermediate/materialized_view.sql)
  TrinoUserError(type=USER_ERROR, name=GENERIC_USER_ERROR, message="line 1:1: Target view 'default_schema.materialized_view_dbt_backup' already exists")

the default_schema.materialized_view_dbt_backup should look something like default_schema_intermediate.materialized_view_dbt_backup

Operating System

macOS 13.4 (22F66)

dbt version

Core: - installed: 1.6.6 - latest: 1.6.6 - Up to date! Plugins: - trino: 1.6.2 - Up to date!

Trino Server version

423-e.1

Python version

Python 3.9.16

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
wjhrdy commented

There is a workaround we have found which is to use this config

{{
    config(
        materialized='materialized_view',
        full_refresh=true,
        pre_hook='use {{ model.schema }}'
    )
}}

A full_refresh will ensure that the DDL is actually updated too, otherwise only the view data will be updated, not the view definition itself. This aligns well with the expected way of deploying/restoring MVs with dbt in Trino: https://docs.getdbt.com/reference/resource-configs/trino-configs#materialized-view

Regarding the pre_hook setting. The problem is that when dbt runs the ALTER/RENAME MV statement it uses a connection/session with a predefined schema, which is the default schema from profiles.yml. But our MV is in a custom schema with "intermediate" suffix. The alter statement only defines full qualifier for the "source" view name, but not for the "target" (the new view name), and Trino defaults to the schema of the session when renaming the view. So it ends up in this broken state, when there's two views created in the wrong schema and the original MV doesn't exist in its expected location, so dbt fails to drop the temp table (dbt uses a/b swap for full MV refresh, with a backup view and a temp view). You can see that you generated a bunch of orphaned views and tables in the wrong schema because of this issue.

wjhrdy commented

relevant sections of dbt-trino code

{% macro trino__get_replace_materialized_view_as_sql(relation, sql, existing_relation, backup_relation, intermediate_relation) %}
{{- trino__get_create_materialized_view_as_sql(intermediate_relation, sql) }}
{% if existing_relation is not none %}
{{ log("Found a " ~ existing_relation.type ~ " with same name. Will drop it", info=true) }}
alter {{ existing_relation.type|replace("_", " ") }} {{ existing_relation }} rename to {{ backup_relation.include(database=False, schema=False) }};
{% endif %}
alter materialized view {{ intermediate_relation }} rename to {{ relation.include(database=False, schema=False) }};
{% endmacro %}

We can clearly see in that code, on line 16, that it ends up trying to rename the existing MV to a backup version, but it drops the schema from the backup version's name - backup_relation.include(database=False, schema=False)

It ends up creating the backup in whatever the default schema of the connection is, instead of the schema it should be in. However, in dbt-core, when it's setting up the MV, it drops the MV backup if it already exists using the actual name it should be, including schema, at https://github.com/dbt-labs/dbt-core/blob/v1.6.6/core/dbt/include/global_project/macros/materializations/models/materialized_view/materialized_view.sql#L32-L34 .

So, what happens is dbt checks for the backup MV to drop if it already exists, doesn't find it, and then dbt-trino tries to rename the existing MV to the backup version but in the wrong schema (that's different from where dbt-core just checked), causing this error.

Overall seems like a clear bug in dbt-trino I think? Without knowing any more context of that code, it seems like it should be using schema=True instead of schema=False when generating the SQL.

wjhrdy commented

This was a team effort, so thanks to Ben and Dalibor on my team for discovering the fixes laid out here.

@wjhrdy Thanks for reporting this! All of your investigation is correct, and indeed, it is a bug.
If you are willing to, please submit a PR for this one. But instead of changing from schema=False to schema=True, you could just remove invoking include method at all, and by default full qualifier wil be used.