AbsaOSS/pramen

Allow to configure Spark configuration for metatables

Closed this issue · 0 comments

Background

Sometimes it is needed to use 'magic' committer for some tables in the metastore, but to use default Spark configuration to write to other tables.

It could be helpful if special Spark configuration can be specified for individual tables in the metastore.

The configuration should be restored after the write.

Feature

Allow to configure Spark configuration for metatables.

Example

pramen.metastore {
  tables = [
    {
      name = "my_table1"
      format = "parquet"
      path = "s3://bucket1/path1"
    },
    {
      name = "my_table2"
      format = "parquet"
      path = "s3a://bucket2/path2"
      spark.conf {
          spark.sql.sources.commitProtocolClass = "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"
          spark.sql.parquet.output.committer.class = "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"
      }
    }
  ]
}

Proposed Solution

  • Add the ability to specify the config
  • Make sure the Spark configuration is restored after the write