Issue with cached credentials when attempting to use different keyfiles in the same Spark App
josecsotomorales opened this issue · 2 comments
Hey folks, I have a Spark Application that reads from a source bucket and writes into a target bucket. I'm experiencing some issues when setting the keyfile for the second operation, as a Hadoop configuration, in theory, the keyfile should get overridden, but it's not the case, the application always uses the first keyfile, I tried to unset, and clear hadoop configs and everything but for whatever reason the connector always uses the first credentials file. Here is a code snippet of what I'm trying to accomplish:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("Multiple GCS Service Accounts") \
.getOrCreate()
spark.conf.set("spark.hadoop.fs.gs.auth.service.account", "/path/to/first/keyfile.json")
# Perform Spark operations using the first key file
# Switch to a different key file
spark.conf.set("spark.hadoop.fs.gs.auth.service.account", "/path/to/second/keyfile.json")
# Perform Spark operations using the second key file
spark.stop()
For Hadoop AWS and Hadoop Azure connectors, there's multiple ways to set credentials per bucket, I would like to have the same in the GCS connector, for example:
// See the bucket variable, I can set keys per bucket
spark.sparkContext.hadoopConfiguration.set(s"fs.s3a.bucket.$bucket.access.key", accessKey)
spark.sparkContext.hadoopConfiguration.set(s"fs.s3a.bucket.$bucket.secret.key", secretKey)
@medb @singhravidutt do you know if this is even possible with the current implementation?