trinodb/trino

Error with Hudi connector when using property hive.partition-projection-enabled

drautela-scwx opened this issue · 2 comments

We are getting error when trying to use the Hudi connector with the Hive property hive.partition-projection-enabled=true

Trino version: 465

/etc/trino/catalog/hudi.properties

connector.name=hudi
hive.metastore=glue
hive.partition-projection-enabled=true
fs.native-s3.enabled=true

Error

2024-12-13T21:53:08.929Z	ERROR	main	io.trino.server.Server	Configuration errors:

1) Error: Configuration property 'hive.partition-projection-enabled' was not used

1 error
io.airlift.bootstrap.ApplicationConfigurationException: Configuration errors:

1) Error: Configuration property 'hive.partition-projection-enabled' was not used

1 error
	at io.airlift.bootstrap.Bootstrap.configure(Bootstrap.java:240)
	at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:269)
	at io.trino.plugin.hudi.HudiConnectorFactory.createConnector(HudiConnectorFactory.java:96)
	at io.trino.plugin.hudi.HudiConnectorFactory.create(HudiConnectorFactory.java:65)
	at io.trino.connector.DefaultCatalogFactory.createConnector(DefaultCatalogFactory.java:211)
	at io.trino.connector.DefaultCatalogFactory.createCatalog(DefaultCatalogFactory.java:128)
	at io.trino.connector.LazyCatalogFactory.createCatalog(LazyCatalogFactory.java:45)
	at io.trino.connector.StaticCatalogManager.lambda$loadInitialCatalogs$1(StaticCatalogManager.java:161)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at java.base/java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:186)
	at io.trino.util.Executors.executeUntilFailure(Executors.java:46)
	at io.trino.connector.StaticCatalogManager.loadInitialCatalogs(StaticCatalogManager.java:155)
	at io.trino.server.Server.doStart(Server.java:156)
	at io.trino.server.Server.lambda$start$0(Server.java:94)
	at io.trino.$gen.Trino_465____20241213_215302_1.run(Unknown Source)
	at io.trino.server.Server.start(Server.java:94)
	at io.trino.server.TrinoServer.main(TrinoServer.java:37)
ebyhr commented

You can't set the config property because it is for Hive connector.

Thanks for looking into this @ebyhr. Since our Hudi connector also uses Hive (Glue) catalog, how do we setup hive/glue properties for Hudi connector?

The documentation says that:
https://trino.io/docs/current/connector/hudi.html

Image

Could you please confirm if Hudi connector is able to read from Hudi tables with partition projection enabled.

We are trying to setup the following in AWS:

  • Trino on EKS: version 465
  • HMS: Glue. We have database for both Hudi and non-Hudi external tables for parquet files.

Currently we are able to query non-Hudi tables successfully.

For Hudi tables the query executes without any error but returns zero records. We have confirmed that records should be returned by executing the same query in Athena.

We have two connectors setup as follows:

(1) /etc/trino/catalog/awsdatacatalog.properties

connector.name=hive
hive.metastore=glue
hive.hive-views.enabled=true
hive.partition-projection-enabled=true
fs.native-s3.enabled=true
hive.hudi-catalog-name=hudi

(2) /etc/trino/catalog/hudi.properties

connector.name=hudi
hive.metastore=glue
fs.native-s3.enabled=true

We do have partition projection enabled for Hudi tables.

Zero records are returned for Hudi tables whether we use awsdatacatalog or hudi catalog

select *
from awsdatacatalog.hudi_db.test_table_ro
where partition1 = "123"
and partition2 = "abc"
limit 10;
select *
from hudi.hudi_db.test_table_ro
where partition1 = "123"
and partition2 = "abc"
limit 10;

Any suggestions on how to figure out the reason for zero records being returned? Any specific logging that could be turned on that would help in debugging this issue?