exasol/hadoop-etl-udfs

Support HDFS HA environments

andrehacker opened this issue · 0 comments

Problem:
It is a very common setup for production HDFS deployments to have HDFS HA (high availability) configured. Currently the ETL UDFs fail in such environments in case of a failover. Currently only one HDFS namenode can be specified.

Potential Solutions

  • Support the full HDFS HA feature: This requires to pass several new parameters to the ETL UDFs, like the mapping from dfs.nameservices to logical namenodes and finally to real namenodes. This is usually configured in a hdfs-site.xml, but it would add some complexity to make this file available in the UDFs.
  • Simple support: Support to specify a set of HDFS namenodes, and in case of connection problems connect to the other. This must also work if the failover happens during long running imports. This is relatively easy to use and implement, while still offering the full functionality.

After discussion with at least one user we propose to follow the second approach.