databricks/iceberg-kafka-connect

Data not visible from Dremio

Opened this issue · 1 comments

Hi,

With configuration below, I ran sync in local and saw data updated in Minio under bucket path warehouse/data. But when I tried to access from Dremio Nessie datasource, no data shown.
If data are inserted through Dremio, it is displayed, and data are created under different folder in Minio. Not sure, this is something to do with Nessie version.

iceberg-connector-config.properties

....
iceberg.tables.dynamic-enabled=true
iceberg.tables.route-field=db_table
iceberg.tables.auto-create-enabled=true
iceberg.catalog.catalog-impl=org.apache.iceberg.nessie.NessieCatalog
iceberg.catalog.uri=http://nessie:19120/api/v2
iceberg.catalog.ref=main
iceberg.catalog.authentication.type=NONE
iceberg.catalog.warehouse=s3a://warehouse
iceberg.catalog.s3.endpoint=http://minio:9000
iceberg.catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
iceberg.catalog.client.region=us-east-1
iceberg.catalog.s3.path-style-access=true

iceberg.tables.cdc-field=_cdc_op
iceberg.table.systemtable.id-columns=node_uuid
iceberg.tables.upsert-mode-enabled=true
iceberg.table.systemtable.partition-by=cluster_uuid

Same problem with me. The only solution seems like to be running a rewrite after data is committed. But in streaming scenario for dynamic multi table fan out scenario in upsert mode, rewriting in a separate process doesn't seem like a viable option.

I am pushing the data to a nessie catalog...

Is there any configuration that I am not aware of to do a rewrite at the time of the commit with this sink connector?