Incorrect Case Sensitivity Handling of Identifier Names in Table Creation
the-ALAM opened this issue · 4 comments
description
- when using DBT along with DuckDB to create tables, i encountered a runtime error during the execution of the
Zipcode
model with it's materialization set totable
. - we suspected a Postgres parser error folding unquoted identifiers into lower case
reproduction Steps
- set up DBT with DuckDB.
- define a model named
Zipcode
- or any other name with the first letter capitalized - and set the materialization totable
. - execute the DBT run/build command - also with --full-refresh -.
expected Behavior
- the
Zipcode
model should be created successfully without encountering any errors.
detailed Behavior
- start a transaction
- create the new version of the table in a temporary location (suffixed
__dbt_tmp
) - rename the existing version of the table to be the backup version (suffixed
__dbt_backup
) - rename the new version of the table to remove the suffix (
__dbt_tmp
→ ``) - commit the transaction
- drop the old/backup version (on many databases, drops need to happen outside of transactions)
actual Behavior
- encountering a runtime error as described above during the execution of the
Zipcode
model. Runtime Error in model {model_name} (models_{dir_name}\{model_name}.sql) Failed to execute query "ALTER TABLE {schema_name}.Zipcode__dbt_tmp RENAME TO Zipcode": ERROR: relation "{schema_name}.zipcode__dbt_tmp" does not exist
discovery Process
- discovered the bug while attempting to create the
Zipcode
model with materialization set totable
. - it seems that DuckDB might be transforming the table name casing, causing the
ALTER TABLE
statement to fail since it can't find the relation because the name is changed from the name to the lower-cased name.
environment
- Operating System: win11 & ubuntu 22.04
- Python Version: 3.12.0
- DBT Version: 1.7.10
- DuckDB Version: 0.10.0
additional Information
- when using materialization set to
incremental
and dropping the table beforehand, the issue does not occur. - this behavior suggests a potential casing bug in DuckDB that may affect the execution of certain SQL statements and usage in certain frameworks.
ah yes that sounds like a real bug, will look into it-- thank you!
huh when I ran this locally with my jaffle shop project I got a dbt (not a DuckDB) compilation error when I tried to name a model "Orders.sql":
05:11:57 Completed with 1 error and 0 warnings:
05:11:57
05:11:57 Compilation Error in model Orders (models/Orders.sql)
When searching for a relation, dbt found an approximate match. Instead of guessing
which relation to use, dbt will move on. Please delete "jaffle_shop"."main"."orders", or rename it to be less ambiguous.
Searched for: "jaffle_shop"."main"."Orders"
Found: "jaffle_shop"."main"."orders"
> in macro load_cached_relation (macros/adapters/relation.sql)
> called by macro materialization_table_duckdb (macros/materializations/table.sql)
> called by model Orders (models/Orders.sql)```
that was against dbt-core 1.7.11; do you not see that? Or am I missing something about how to reproduce the error you got with the `tmp` renaming issue not working?
the error you have is because the orders
identifier exists already, and if you dropped the orders
and rerun the Orders
model it should work on your end and to see the tmp
try adding the --debug option to your command.
and i must clarify two things:
1- sorry i missed a step from the repro which is:
'attaching a postgres database the model will be build on'
eg;
extensions:
- postgres
attach:
- path: 'postgres:host= password= user= dbname='
alias: pg
i didnt add it since i wanted to focus our attention on duckdb
2- second i could've made this issue in duckdb since i suspected the issue is from duckdb except for this duckdb/duckdb#10356 (comment),
so given duckdb's deliberate identifier case insensitivity and that the problem happened with DBT in combination with duckdb, i though it's only rational that i post here first and see if we can reach a solution together
the fix should come from duckdb's support for postgres, not a work around from your end