delta-io/connectors

Flink DeltaSource should use Flink Configuration and not hadoop configuration

ramkrish86 opened this issue · 2 comments

Today in DeltaSource for flink we are using hadoop configuration as the path. So this mandates the core-site.xml to be present with the ABFS storage key etc to be available. Instead if we pass Flink Configuration. This will ensure anything we add in flink-conf.yaml can itself be used instead of introducing another core-site.xml . This is helpful in K8s based managed deployments.

This seems to be the current way with all the connectors. So closing this for now.

@ramkrish86
Actually I think you can try to use flink-conf.yml already.
You can try to use Flink's GlobalConfiguration.loadConfiguration() from flink-core module, which creates Flink's org.apache.flink.configuration object that reads cluster config from FLINK_CONF_DIR environment variable.
Then you can have a custom code to extract properties from Flink configuration object and build hadoop configuration object from it. It is expected that FLINK_CONF_DIR will have flink-conf.yaml.

In connector 0.7.0-snapshot Im using this approach. to extrac Hadoop file system properties (not Delta).
In there I've introduced an "internal" HadoopUtils.getHadoopConfiguration(configuration)

This method creates a HadoopConfiguration object from various of sources.
You can call this method like this:

Configuration hadoopConfiguration =
            HadoopUtils.getHadoopConfiguration(GlobalConfiguration.loadConfiguration());

The GlobalConfiguration.loadConfiguration() creates Flink's org.apache.flink.configuration that reads cluster config from FLINK_CONF_DIR environment variable.