AbsaOSS/pramen

Allow alternative paths to tables when exposing data via Hive.

Closed this issue · 0 comments

Background

AWS S3 access points can be used for fine-grained permissions. So a metastore table should use the access poing for writing to the table. But for reading data from the table Athena should use the underlying bucket. So the the essence when exposing a table via Hive, the alternative path should be used.

Feature

Allow alternative paths to tables when exposing data via Hive.

Example

For metastore

pramen.metastore {
  tables.2 = [
   {
     name = "table2"
     format = "parquet"
     path = "s3://access_point/abc"
     hive.table = "hive_table_not_supported"
     hive.path = "s3://bucket/path/abc"
   }
 ]
}

For sinks:

  output {
     # Optional when running Enceladus from Pramen
     dataset.name = "my_dataset"
     dataset.version = 2

     # Optional publish base path (for detecting version number)
     publish.base.path = "s3://access_point/data_lake/publish"
     # Optional Hive table to repair after Enceladus is executed
     hive.table = "my_database.my_table"
     hive.path = "s3://bucket/path/abc"
  }