Catalog Error postgres_scanner: postgres_scan already exists
alexismanuel opened this issue · 5 comments
Hi, I did not find anything related to my problem so here is my current issue hopping it is not a duplicate. I have the following error when triyng to use postgres_scanner for a simple model:
dbt build --select my_model
12:22:38 Running with dbt=1.5.1
12:22:41 Found 188 models, 2 tests, 8 snapshots, 0 analyses, 821 macros, 0 operations, 0 seed files, 140 sources, 0 exposures, 0 metrics, 0 groups
12:22:41
12:22:41
12:22:41 Finished running in 0 hours 0 minutes and 0.46 seconds (0.46s).
12:22:41 Encountered an error:
Runtime Error
Invalid Input Error: Initialization function "postgres_scanner_init" from file "USER_PATH/.duckdb/extensions/v0.6.1/osx_arm64/postgres_scanner.duckdb_extension" threw an exception: "Catalog Error: Table Function with name "postgres_scan" already exists!"
Here is my setup:
- M2 MBP
- Python 3.10.1
- duckdb v0.6.1
- dbt-core & dbt-duckdb version 1.5.1 (installed with pip3)
- my_model.py:
{{ config(materialized='table') }}
SELECT
*
FROM
{{ source('my_schema', 'my_upstream') }}
- schema.yml:
sources:
- name: my_schema
meta:
external_location: "postgres_scan('dbname=postgres user=postgres password=postgres host=postgres port=5432', 'my_schema', '{name}')"
tables:
- name: my_upstream
- profiles.yml:
dev:
type: duckdb
path: ./my_db.duckdb
extensions:
- postgres_scanner
threads: 4
Thanks in advance for your guidance !
Hey @alexismanuel, thanks for sending this my way! I easily reproduced the error you saw, so thank you for the detailed info on your environmental setup. The problem, it seems, is tied to DuckDB 0.6.1-- using a later version of DuckDB (0.7.1 or 0.8.0) did not trigger that error when I loaded the postgres_scanner extension.
Is upgrading to a later version of DuckDB an option in your environment? If not, I think we'll need to do some work to figure out what is going on upstream and see if we can get a good build of the postgres_scanner extension for 0.6.1.
Ah ha! No-- it's not an upstream issue, it's some sort of interaction between the way dbt-duckdb loads extensions and the way this extension seems to work on 0.6.1-- this is fascinating, going to dive in some more here
Thanks for your answer and good luck with the researches. On my side, I managed to upgrade DuckDB up to version 0.8.0 and it works like a charm !
I got the same error with these versions of the packages:
dbt-core 1.7.3
dbt-duckdb 1.7.0
dbt-extractor 0.5.1
dbt-semantic-interfaces 0.4.2
duckdb 0.9.2
After a long period of troubleshooting it I realized it was explicitly connected to concurrency and threads, because if I changed the number of threads from 4 to 1 in my profiles.yml, dbt run completed successfully. Then I figured out that the reason this happened was because I had not explicitly declared the postgres_scanner extension in profiles.yml, and it still worked (models were created, albeit a bit randomly and after a couple of dbt runs). So I did not understand until after a long time that not declaring the extension in profiles.yml causes trouble when using multiple threads.
I write this comment in case it can be helpful for others, and maybe the documentation could reflect that declaring it is not optional if you want to use multiple threads. @jwills