Incorrect Schema Used When Renaming Materialized Views
wjhrdy opened this issue · 4 comments
Expected behavior
When renaming a materialized view to a backup version, the schema of the original materialized view should be retained. The backup version should be created in the same schema as the original materialized view.
Actual behavior
Currently, when renaming a materialized view to a backup version, the schema of the backup version is dropped. This results in the backup version being created in the default schema of the connection, instead of the schema of the original materialized view. This discrepancy causes an error when dbt checks for the backup materialized view to drop if it already exists, as it checks in the original schema, not finding the backup version that was created in the default schema.
Steps To Reproduce
- Create a view in a non-default schema using dbt-trino.
- Attempt to create a materialized view of the view just created by adding this to the view:
{{ config( materialized='materialized_view' ) }}
- Observe that the backup version is created in the default schema of the connection, not the original schema of the materialized view and not cleaned up.
- Unmaterialize the view by removing the snippet. Run.
- Observe that the backup version is not cleaned up still.
- Rematerialize the view by adding the snippet. Run.
- When dbt tries to create the backup it already exists because it wasn't cleaned up, and creates an error.
Log output/Screenshots
Database Error in model int_table (intermediate/materialized_view.sql)
TrinoUserError(type=USER_ERROR, name=GENERIC_USER_ERROR, message="line 1:1: Target view 'default_schema.materialized_view_dbt_backup' already exists")
the default_schema.materialized_view_dbt_backup
should look something like default_schema_intermediate.materialized_view_dbt_backup
Operating System
macOS 13.4 (22F66)
dbt version
Core: - installed: 1.6.6 - latest: 1.6.6 - Up to date! Plugins: - trino: 1.6.2 - Up to date!
Trino Server version
423-e.1
Python version
Python 3.9.16
Are you willing to submit PR?
- Yes I am willing to submit a PR!
There is a workaround we have found which is to use this config
{{
config(
materialized='materialized_view',
full_refresh=true,
pre_hook='use {{ model.schema }}'
)
}}
A full_refresh
will ensure that the DDL is actually updated too, otherwise only the view data will be updated, not the view definition itself. This aligns well with the expected way of deploying/restoring MVs with dbt in Trino: https://docs.getdbt.com/reference/resource-configs/trino-configs#materialized-view
Regarding the pre_hook
setting. The problem is that when dbt runs the ALTER/RENAME MV statement it uses a connection/session with a predefined schema, which is the default schema from profiles.yml. But our MV is in a custom schema with "intermediate" suffix. The alter statement only defines full qualifier for the "source" view name, but not for the "target" (the new view name), and Trino defaults to the schema of the session when renaming the view. So it ends up in this broken state, when there's two views created in the wrong schema and the original MV doesn't exist in its expected location, so dbt fails to drop the temp table (dbt uses a/b swap for full MV refresh, with a backup view and a temp view). You can see that you generated a bunch of orphaned views and tables in the wrong schema because of this issue.
relevant sections of dbt-trino code
We can clearly see in that code, on line 16, that it ends up trying to rename the existing MV to a backup version, but it drops the schema from the backup version's name - backup_relation.include(database=False, schema=False)
It ends up creating the backup in whatever the default schema of the connection is, instead of the schema it should be in. However, in dbt-core, when it's setting up the MV, it drops the MV backup if it already exists using the actual name it should be, including schema, at https://github.com/dbt-labs/dbt-core/blob/v1.6.6/core/dbt/include/global_project/macros/materializations/models/materialized_view/materialized_view.sql#L32-L34 .
So, what happens is dbt checks for the backup MV to drop if it already exists, doesn't find it, and then dbt-trino tries to rename the existing MV to the backup version but in the wrong schema (that's different from where dbt-core just checked), causing this error.
Overall seems like a clear bug in dbt-trino
I think? Without knowing any more context of that code, it seems like it should be using schema=True
instead of schema=False
when generating the SQL.
This was a team effort, so thanks to Ben and Dalibor on my team for discovering the fixes laid out here.
@wjhrdy Thanks for reporting this! All of your investigation is correct, and indeed, it is a bug.
If you are willing to, please submit a PR for this one. But instead of changing from schema=False
to schema=True
, you could just remove invoking include
method at all, and by default full qualifier wil be used.