duckdb/duckdb_iceberg

Unable to do JOIN between to iceberg based tables

sarialalem1 opened this issue · 4 comments

Example of what I'm trying to do:

select * from iceberg_scan('s3://<url1>.metadata.json') as t1 join iceberg_scan('s3://t2.metadata.json') as t2 on 1=1

When running the above I get this error:
Binder Error: Duplicate alias \"iceberg_scan_data\" in query!

Although I'm passing an alias for each iceberg_scan operation, it's still assigning the same iceberg_scan_data alias to both.

A suggestion would be to add the alias ass a parameter to the function.

You can use WITH clause instead of alias to define both t1 and t2.

Yea this is a side-effect of the way things are implemented right now. This will likely change in the future, but I need to get some refactors in DuckDB-side before I can fix that

@samansmink maybe this would be a good solution? #57

In addition to @harel-e suggestion of WITH, you can also create views, which work fine:

sql = '''
CREATE SCHEMA IF NOT EXISTS tpc;

CREATE VIEW tpc.customer AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/customer', allow_moved_paths = true);

CREATE VIEW tpc.lineitem AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/lineitem', allow_moved_paths = true);

CREATE VIEW tpc.nation AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/nation', allow_moved_paths = true);

CREATE VIEW tpc.orders AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/orders', allow_moved_paths = true);

CREATE VIEW tpc.part AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/part', allow_moved_paths = true);

CREATE VIEW tpc.partsupp AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/partsupp', allow_moved_paths = true);

CREATE VIEW tpc.region AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/region', allow_moved_paths = true);

CREATE VIEW tpc.supplier AS
SELECT * FROM iceberg_scan('/Users/mritchie712/blackbird/notebooks/iceduck_tpc/default.db/supplier', allow_moved_paths = true);
'''

# Execute the SQL
con.execute(sql)
sql = f"""
select * 
from 
tpc.customer as c inner join 
tpc.orders as o on c.c_custkey = o.o_custkey
limit 20;
"""
res = con.execute(sql).fetchdf()
res