Allow to configure Spark configuration for metatables
Closed this issue · 0 comments
yruslan commented
Background
Sometimes it is needed to use 'magic' committer for some tables in the metastore, but to use default Spark configuration to write to other tables.
It could be helpful if special Spark configuration can be specified for individual tables in the metastore.
The configuration should be restored after the write.
Feature
Allow to configure Spark configuration for metatables.
Example
pramen.metastore {
tables = [
{
name = "my_table1"
format = "parquet"
path = "s3://bucket1/path1"
},
{
name = "my_table2"
format = "parquet"
path = "s3a://bucket2/path2"
spark.conf {
spark.sql.sources.commitProtocolClass = "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"
spark.sql.parquet.output.committer.class = "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"
}
}
]
}
Proposed Solution
- Add the ability to specify the config
- Make sure the Spark configuration is restored after the write