GoogleCloudDataproc/spark-bigquery-connector

Impersonate Service Account

lcaggio opened this issue · 1 comments

I am writing a pyspark script on GCP Dataproc to:

  • impersonate a service account
  • read data from GCS (using the credential of the impersonated SA)
  • write data to BigQuery (using the credential of the impersonated SA)

The Dataproc service account has AccessTokenCreator role on the service account to be impersonated (delegated_sa), the delegated_sa has access to GCS and BQ.

Script

...
spark = SparkSession.builder
.appName("Read CSV from GCS and Write to BigQuery")
.config('spark.hadoop.fs.gs.auth.impersonation.service.account', delegated_sa)
.config('gcpImpersonationServiceAccount', delegated_sa)
.getOrCreate()
...
data = spark.read.format("csv")
.schema(schema)
.load(csv)
...
data.write.format('bigquery')
.option('table', dataset_table)
.mode('append')
.save()
...

Error

        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
GET https://www.googleapis.com/bigquery/v2/projects/dataproc/datasets/tables/customers?prettyPrint=false
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "Access Denied: Tabledataproc:customers: Permission bigquery.tables.get denied on tabledataproc:customers (or it may not exist).",
    "reason" : "accessDenied"
  } ],
  "message" : "Access Denied: Tabledataproc:dataproc_out.customers: Permission bigquery.tables.get denied on tabledataproc:customers (or it may not exist).",
  "status" : "PERMISSION_DENIED"
}
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
        at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getTable(HttpBigQueryRpc.java:284)
        ... 45 more

Note

  • The service account impersonification for GCS works properly
  • If I do service account credentials over json key (.config("credentials", BASE64)) instead of impersonification, it works properly

Closing, I was using an old version of the bigquery connector ... Using the spark-bigquery-with-dependencies_2.12-0.36.1.jar it works with no issues.