duckdb/duckdb-postgres

Support for querying views

Closed this issue ยท 8 comments

hello, i wanted to open this issue to track potential efforts on using postgres_attach/postgres_scanner to be able to query views. i didn't realize querying views wasn't possible until I saw it commented on #20 and i didn't see another open issue specifically for this use case.

Please support materialized views as well. In fact they might be easier to support because their data is already, well, materialized.
Note that their meta-data is not in information_schema, but I gather from #84 that the scanner is using pg_class and other systems tables.

@dpprdan I've had success with materialized views!

But agree views would be nice too and relatedly, using indices when the a filter is applied that could benefit from an index - which I don't believe happens today. For example I want to be able to scan on created_at > X , using an index on created_at, but for >3M rows the performance is quite poor even when the number of rows matching that condition is tiny.

@vimota how have you been able to query materialized views? When I try it says the table was not found. When I do PRAGMA show_tables, materialized view don't appear.

I guess @vimota used SELECT * FROM POSTGRES_SCAN(), see below.

I'll add a minimal working example regarding matviews:

Setup
  1. spin-up a local postgres docker container
    docker run --name mypg -d --rm -p 5432:5432 -e POSTGRES_PASSWORD=mysecretpassword postgres
  2. connect and create table and matview (I use a connection service file (service=test) and a password file to connect).
    PS C:\Users\Daniel> psql service=test
    psql (15.3, server 15.4 (Debian 15.4-1.pgdg120+1))
    
    postgres=# CREATE TABLE mytable AS SELECT 2 AS col;
    SELECT 1
    
    postgres=# CREATE MATERIALIZED VIEW mymatview AS SELECT 1 AS col;
    SELECT 1
    
    postgres=# \d
                     List of relations
     Schema |   Name    |       Type        |  Owner
    --------+-----------+-------------------+----------
     public | mymatview | materialized view | postgres
     public | mytable   | table             | postgres
    (2 rows)
PS C:\Users\Daniel> .\duckdb
v0.8.1 6536a77232
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

D INSTALL postgres;
D LOAD postgres;
D CALL POSTGRES_ATTACH('service=test');
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Success โ”‚
โ”‚ boolean โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0 rows  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D PRAGMA show_tables;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  name   โ”‚
โ”‚ varchar โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ mytable โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D SELECT * FROM mytable;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  col  โ”‚
โ”‚ int32 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     2 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D SELECT * FROM mymatview;
Error: Catalog Error: Table with name mymatview does not exist!
Did you mean "mytable"?
LINE 1: SELECT * FROM mymatview;
                      ^
D SELECT * FROM POSTGRES_SCAN('service=test', 'public', 'mymatview');
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  col  โ”‚
โ”‚ int32 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     1 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D .quit

So PRAGMA show_tables; and SELECT * FROM mymatview; don't work with matviews (they are not POSTGRES_ATTACH()ed), while querying an individual matview with SELECT * FROM POSTGRES_SCAN() does. (Update: My guess is that this is because matviews are essentially stored as tables, while their metadata is stored differently and similar to that of (regular) views).

Got it โ€“ thanks for the extremely thorough example and setup instructions, that's super helpful. Hopefully views will also be similarly supported soon.

Thanks for the report! I've added support for querying views (and more) in #111.

@Mytherin Thanks a bunch for merging this! I was trying to test it out but maybe it's still too soon or I have the syntax wrong.

Do we still have to do INSTALL postgres; LOAD postgres; LOAD postgres_scanner and then follow that with ATTACH 'dbname=my_db' AS postgres (TYPE POSTGRES); I'm getting the following error using duckdb 0.9.1:

Error: Binder Error: Unrecognized storage type "POSTGRES"

The released binaries for v0.9.1 still contain the old version - if you want to try out the extension you will need to compile it from source currently. We will likely upload the new version in the coming weeks.