IOException when querying table with a list<int> column.
daturkel opened this issue ยท 4 comments
Hello, thanks for your great work on duckdb-iceberg!
I'm encountering an issue where attempting to query an iceberg table with iceberg_scan
fails if the table has a column which is a list of ints (list<int>
).
table_with_lists # a pyiceberg.table.Table object
table_without_lists # another pyiceberg.table.Table object
duckdb.sql("INSTALL ICEBERG; LOAD ICEBERG;")
# querying the table that doesn't feature any lists works fine
duckdb.sql(f"SELECT * FROM iceberg_scan('{table_without_lists.metadata_location}') LIMIT 10")
# querying the table that *does* have lists throws an error
duckdb.sql(f"SELECT * FROM iceberg_scan('{table_with_lists.metadata_location}') LIMIT 10")
# IOException: IO Error: Invalid field found while parsing field: type
This seems to be related to an error parsing the table schema. I was wondering if there maybe was no logic for complex types in the duckdb-iceberg extension, but it appears that there is.
(Whoops, originally tried this without adding the skip_schema_inference
argument. Tried again and it works now, update below.)
Thanks for the quick response! I tried the workaround suggested in that thread:
duckdb.sql("force install iceberg from 'http://nightly-extensions.duckdb.org'")
duckdb.sql("load iceberg")
duckdb.sql(f"SELECT * FROM iceberg_scan('{metadata_location}', skip_schema_inference=True) LIMIT 10")
and it worked perfectly!
I think I've addressed this with #50, it should fix parsing the metadata.
I'm getting the same error with DuckDB v1.0.0 1f98600c2c and some Iceberg data with nested fields
๐กโ SELECT *
FROM iceberg_scan('s3://my-bucket/iceberg/rmoff.db/rmoff_basket02/metadata/00640-ed419044-046c-44c0-a20a-e51f0cce381f.metadata.json');
Run Time (s): real 1.685 user 0.008519 sys 0.006796
IO Error: Invalid field found while parsing field: type
It works fine if I use skip_schema_inference=True
. Here's the schema as seen by duckDB if I CTAS the iceberg table into a local one:
๐กโ describe tmp
;
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโ
โ column_name โ column_type โ null โ key โ default โ extra โ
โ varchar โ varchar โ varchar โ varchar โ varchar โ varchar โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโค
โ basketId โ VARCHAR โ YES โ โ โ โ
โ customerId โ VARCHAR โ YES โ โ โ โ
โ customerName โ VARCHAR โ YES โ โ โ โ
โ customerAddress โ VARCHAR โ YES โ โ โ โ
โ storeId โ VARCHAR โ YES โ โ โ โ
โ storeName โ VARCHAR โ YES โ โ โ โ
โ storeLocation โ VARCHAR โ YES โ โ โ โ
โ products โ STRUCT(productName VARCHAR, quantity INTEGER, unitPrice FLOAT, category VARCHAR)[] โ YES โ โ โ โ
โ timestamp โ VARCHAR โ YES โ โ โ โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโ
Run Time (s): real 0.002 user 0.000879 sys 0.000218
The Iceberg manifest is:
{
"format-version" : 2,
"table-uuid" : "59dc9799-83e7-4bec-8a84-2a74e3abeaf6",
"location" : "s3://rmoff/iceberg-test/rmoff.db/rmoff_basket02",
"last-sequence-number" : 646,
"last-updated-ms" : 1717678301424,
"last-column-id" : 14,
"current-schema-id" : 0,
"schemas" : [ {
"type" : "struct",
"schema-id" : 0,
"fields" : [ {
"id" : 1,
"name" : "basketId",
"required" : false,
"type" : "string"
}, {
"id" : 2,
"name" : "customerId",
"required" : false,
"type" : "string"
}, {
"id" : 3,
"name" : "customerName",
"required" : false,
"type" : "string"
}, {
"id" : 4,
"name" : "customerAddress",
"required" : false,
"type" : "string"
}, {
"id" : 5,
"name" : "storeId",
"required" : false,
"type" : "string"
}, {
"id" : 6,
"name" : "storeName",
"required" : false,
"type" : "string"
}, {
"id" : 7,
"name" : "storeLocation",
"required" : false,
"type" : "string"
}, {
"id" : 8,
"name" : "products",
"required" : false,
"type" : {
"type" : "list",
"element-id" : 10,
"element" : {
"type" : "struct",
"fields" : [ {
"id" : 11,
"name" : "productName",
"required" : false,
"type" : "string"
}, {
"id" : 12,
"name" : "quantity",
"required" : false,
"type" : "int"
}, {
"id" : 13,
"name" : "unitPrice",
"required" : false,
"type" : "float"
}, {
"id" : 14,
"name" : "category",
"required" : false,
"type" : "string"
} ]
},
"element-required" : false
}
}, {
"id" : 9,
"name" : "timestamp",
"required" : false,
"type" : "string"
} ]
} ],
[โฆ]