duckdb/dbt-duckdb

Incorrect Case Sensitivity Handling of Identifier Names in Table Creation

the-ALAM opened this issue · 4 comments

description

  • when using DBT along with DuckDB to create tables, i encountered a runtime error during the execution of the Zipcode model with it's materialization set to table.
  • we suspected a Postgres parser error folding unquoted identifiers into lower case

reproduction Steps

  1. set up DBT with DuckDB.
  2. define a model named Zipcode - or any other name with the first letter capitalized - and set the materialization to table.
  3. execute the DBT run/build command - also with --full-refresh -.

expected Behavior

  • the Zipcode model should be created successfully without encountering any errors.

detailed Behavior

  • start a transaction
  • create the new version of the table in a temporary location (suffixed __dbt_tmp )
  • rename the existing version of the table to be the backup version (suffixed __dbt_backup)
  • rename the new version of the table to remove the suffix (__dbt_tmp → ``)
  • commit the transaction
  • drop the old/backup version (on many databases, drops need to happen outside of transactions)

actual Behavior

  • encountering a runtime error as described above during the execution of the Zipcode model.
  • Runtime Error in model {model_name} (models_{dir_name}\{model_name}.sql) Failed to execute query "ALTER TABLE {schema_name}.Zipcode__dbt_tmp RENAME TO Zipcode": ERROR: relation "{schema_name}.zipcode__dbt_tmp" does not exist

discovery Process

  • discovered the bug while attempting to create the Zipcode model with materialization set to table.
  • it seems that DuckDB might be transforming the table name casing, causing the ALTER TABLE statement to fail since it can't find the relation because the name is changed from the name to the lower-cased name.

environment

  • Operating System: win11 & ubuntu 22.04
  • Python Version: 3.12.0
  • DBT Version: 1.7.10
  • DuckDB Version: 0.10.0

additional Information

  • when using materialization set to incremental and dropping the table beforehand, the issue does not occur.
  • this behavior suggests a potential casing bug in DuckDB that may affect the execution of certain SQL statements and usage in certain frameworks.

ah yes that sounds like a real bug, will look into it-- thank you!

huh when I ran this locally with my jaffle shop project I got a dbt (not a DuckDB) compilation error when I tried to name a model "Orders.sql":

05:11:57  Completed with 1 error and 0 warnings:
05:11:57
05:11:57    Compilation Error in model Orders (models/Orders.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "jaffle_shop"."main"."orders", or rename it to be less ambiguous.
  Searched for: "jaffle_shop"."main"."Orders"
  Found: "jaffle_shop"."main"."orders"

  > in macro load_cached_relation (macros/adapters/relation.sql)
  > called by macro materialization_table_duckdb (macros/materializations/table.sql)
  > called by model Orders (models/Orders.sql)```

that was against dbt-core 1.7.11; do you not see that? Or am I missing something about how to reproduce the error you got with the `tmp` renaming issue not working?

the error you have is because the orders identifier exists already, and if you dropped the orders and rerun the Orders model it should work on your end and to see the tmp try adding the --debug option to your command.

and i must clarify two things:

1- sorry i missed a step from the repro which is:
'attaching a postgres database the model will be build on'
eg;

extensions:
        - postgres
attach:
	- path: 'postgres:host= password= user= dbname='
      alias: pg

i didnt add it since i wanted to focus our attention on duckdb

2- second i could've made this issue in duckdb since i suspected the issue is from duckdb except for this duckdb/duckdb#10356 (comment),
so given duckdb's deliberate identifier case insensitivity and that the problem happened with DBT in combination with duckdb, i though it's only rational that i post here first and see if we can reach a solution together

the fix should come from duckdb's support for postgres, not a work around from your end