Question: how to speed up iceberg reads
michaelzwong opened this issue · 1 comments
michaelzwong commented
I'm currently reading from my glue-cataloged iceberg table using the following:
duckdb.sql(
f"""
INSTALL httpfs;
LOAD httpfs;
set s3_region = 'us-west-2';
set s3_access_key_id = '{settings.AWS_ACCESS_KEY_ID}';
set s3_secret_access_key = '{settings.AWS_SECRET_ACCESS_KEY}';
INSTALL iceberg;
LOAD iceberg;
"""
)
res = duckdb.execute(
"SELECT * FROM iceberg_scan('s3://foopath) LIMIT 100"
)
The execution is very slow compared to just reading from the .parquet files at the same path (eg. 2 minutes vs 2 seconds).
res = duckdb.execute(
"SELECT * FROM parquet_scan('s3://foopath/*.parquet) LIMIT 100"
)
Would like to know what I'm doing wrong or if someone has a solution