Dataform Authentication Scopes
Opened this issue · 0 comments
The Problem:
Say I have the following SQLX file code
config {
type: "table",
schema: "intermediate",
name: "scv",
tags: ["scv"],
description: "Single customer view",
}
SELECT
ID,
Name,
Email
FROM
${ref("google_sheet_table")}
Running this query in Dataform will give me:
bigquery error: Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials. at [3:1]
This is due to the Application Default Credentials authentication not containing the correct scopes to perform the Google Sheets operation, even if the Service Account running the request contains the correct level of access to the Sheet.
If I was to perform this operation with a service account that had the correct Domain Wide Delegation or API access in Google Workspaces with this scope: 'https://www.googleapis.com/auth/spreadsheets' provided in the authentication request, the query would run as intended.
In python I can create the correct credential object without using Service Account keys as follows:
import requests
from google.oauth2 import service_account
from google.auth import impersonated_credentials
from google.auth import default, iam
from google.auth.transport import requests
TOKEN_URI = 'https://accounts.google.com/o/oauth2/token'
SCOPES = ['https://www.googleapis.com/auth/drive','https://www.googleapis.com/auth/admin.directory.user.readonly','https://www.googleapis.com/auth/cloud-platform']
def delegated_credentials(credentials, subject, scopes):
try:
# If we are using service account credentials from json file
# this will work
updated_credentials = credentials.with_subject(subject).with_scopes(scopes)
except AttributeError:
# This exception is raised if we are using default credentials
request = requests.Request()
# Refresh the default credentials. This ensures that the information
# about this account, notably the email, is populated.
credentials.refresh(request)
# Create an IAM signer using the default credentials.
signer = iam.Signer(
request,
credentials,
credentials.service_account_email
)
# Create OAuth 2.0 Service Account credentials using the IAM-based
# signer and the bootstrap_credential's service account email.
updated_credentials = service_account.Credentials(
signer,
credentials.service_account_email,
TOKEN_URI,
scopes=scopes,
subject=subject
)
except Exception:
raise
return updated_credentials
def main():
# Default user credentials OR default SA credentials requires Service Account Token Creator and Service Account User
credentials, _ = default()
# Requires sevice account token creator for applied as a permission on itself - so that it can sign its own request
service_account_with_delegated_admin = '<SERVICE_ACCOUNT_EMAIL>'
# User must have permissions to perform the required tasks - Not specifically required for this task if the SA has the Google Sheet shared to them
user_to_impersonate_in_workspaces = '<WORKSPACE_EMAIL>'
try:
# Impersonates the service account using the default user/sa
target_credentials = impersonated_credentials.Credentials(
source_credentials=credentials,
target_principal=service_account_with_delegated_admin,
target_scopes=SCOPES
)
# Creates delegate authentication to the workspaces users, using the impersonated SA
workspaces_credentials = delegated_credentials(target_credentials, user_to_impersonate_in_workspaces, SCOPES)
except Exception as e:
print('Failed', e)
if __name__ == "__main__":
# User / Default > Impersonated SA > Delegated User in Workspaces
main()
I am wondering if dynamic scopes will become something that the Dataform Core library will support as more customers and teams begin to lean on External Tables like Google Sheets in their workflows.
Currently the simplest solution is to create a Scheduled Query that WRITE TRUNCATES
the content from the Sheet into a Standard BQ Table which is then utilised as a declare in Dataform for further operations, which leads to potential out of order processing due to Dataform not being able to control the table so more assertions and tests are required.