AbsaOSS/atum

PoC _INFO files generation in S3

Closed this issue · 0 comments

Enceladus expects to be able to generate _INFO files to be generated in the output directory alongside the spark output.

This feature is originally implemented using HDFS API, for AWS S3, we need to replicate the functionality for S3. Options are:

  • AWS SDK for S3 API (primary option)
  • using temp HDFS location and copying the _INFO file(s) over (fallback option)

The most prominent entry point should be:
AtumImplicits.SparkSessionWrapper(spark) and internally ControlFrameworkState.storeCurrentInfoFile