duckdb/duckdb_iceberg

IOException when querying table with a list<int> column.

daturkel opened this issue ยท 4 comments

Hello, thanks for your great work on duckdb-iceberg!

I'm encountering an issue where attempting to query an iceberg table with iceberg_scan fails if the table has a column which is a list of ints (list<int>).

table_with_lists # a pyiceberg.table.Table object
table_without_lists # another pyiceberg.table.Table object

duckdb.sql("INSTALL ICEBERG; LOAD ICEBERG;")

# querying the table that doesn't feature any lists works fine
duckdb.sql(f"SELECT * FROM iceberg_scan('{table_without_lists.metadata_location}') LIMIT 10")

# querying the table that *does* have lists throws an error
duckdb.sql(f"SELECT * FROM iceberg_scan('{table_with_lists.metadata_location}') LIMIT 10")
# IOException: IO Error: Invalid field found while parsing field: type

This seems to be related to an error parsing the table schema. I was wondering if there maybe was no logic for complex types in the duckdb-iceberg extension, but it appears that there is.

Hey @daturkel!

There should be a workaround here #45. It requires installing the nightly extension though!

(Whoops, originally tried this without adding the skip_schema_inference argument. Tried again and it works now, update below.)

Thanks for the quick response! I tried the workaround suggested in that thread:

duckdb.sql("force install iceberg from 'http://nightly-extensions.duckdb.org'")
duckdb.sql("load iceberg")
duckdb.sql(f"SELECT * FROM iceberg_scan('{metadata_location}', skip_schema_inference=True) LIMIT 10")

and it worked perfectly!

I think I've addressed this with #50, it should fix parsing the metadata.

I'm getting the same error with DuckDB v1.0.0 1f98600c2c and some Iceberg data with nested fields

๐ŸŸกโ—— SELECT *
    FROM iceberg_scan('s3://my-bucket/iceberg/rmoff.db/rmoff_basket02/metadata/00640-ed419044-046c-44c0-a20a-e51f0cce381f.metadata.json');
Run Time (s): real 1.685 user 0.008519 sys 0.006796
IO Error: Invalid field found while parsing field: type

It works fine if I use skip_schema_inference=True. Here's the schema as seen by duckDB if I CTAS the iceberg table into a local one:

๐ŸŸกโ—— describe tmp
    ;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   column_name   โ”‚                                    column_type                                     โ”‚  null   โ”‚   key   โ”‚ default โ”‚  extra  โ”‚
โ”‚     varchar     โ”‚                                      varchar                                       โ”‚ varchar โ”‚ varchar โ”‚ varchar โ”‚ varchar โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ basketId        โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ customerId      โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ customerName    โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ customerAddress โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ storeId         โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ storeName       โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ storeLocation   โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ products        โ”‚ STRUCT(productName VARCHAR, quantity INTEGER, unitPrice FLOAT, category VARCHAR)[] โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ”‚ timestamp       โ”‚ VARCHAR                                                                            โ”‚ YES     โ”‚         โ”‚         โ”‚         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Run Time (s): real 0.002 user 0.000879 sys 0.000218

The Iceberg manifest is:

{
  "format-version" : 2,
  "table-uuid" : "59dc9799-83e7-4bec-8a84-2a74e3abeaf6",
  "location" : "s3://rmoff/iceberg-test/rmoff.db/rmoff_basket02",
  "last-sequence-number" : 646,
  "last-updated-ms" : 1717678301424,
  "last-column-id" : 14,
  "current-schema-id" : 0,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "basketId",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 2,
      "name" : "customerId",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 3,
      "name" : "customerName",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 4,
      "name" : "customerAddress",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 5,
      "name" : "storeId",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 6,
      "name" : "storeName",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 7,
      "name" : "storeLocation",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 8,
      "name" : "products",
      "required" : false,
      "type" : {
        "type" : "list",
        "element-id" : 10,
        "element" : {
          "type" : "struct",
          "fields" : [ {
            "id" : 11,
            "name" : "productName",
            "required" : false,
            "type" : "string"
          }, {
            "id" : 12,
            "name" : "quantity",
            "required" : false,
            "type" : "int"
          }, {
            "id" : 13,
            "name" : "unitPrice",
            "required" : false,
            "type" : "float"
          }, {
            "id" : 14,
            "name" : "category",
            "required" : false,
            "type" : "string"
          } ]
        },
        "element-required" : false
      }
    }, {
      "id" : 9,
      "name" : "timestamp",
      "required" : false,
      "type" : "string"
    } ]
  } ],
[โ€ฆ]