GCSFileSystem does not accept token, file.json or instance of service credentials
ncclementi opened this issue · 13 comments
According to this docs https://gcsfs.readthedocs.io/en/latest/index.html#credentials we should be able to pass a .json
file with credentials or an instance of Credentials but this is currently not working
MRE - json file
import gcsfs
import google.auth
from google.oauth2 import service_account
project_id = "my_project_idg"
key_file = "/Users/my_user/creds.json" #these are service account creds
res = gcsfs.GCSFileSystem(project_id=project_id, token=key_file).ls("/")
Traceback
_request non-retriable exception: Invalid argument., 400
Traceback (most recent call last):
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
return await func(*args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 422, in _request
validate_response(status, contents, path, args)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 101, in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400
Traceback (most recent call last):
File "/Users/ncclementi/Documents/dask-bigquery-demo/demo.py", line 19, in <module>
res = gcsfs.GCSFileSystem(project_id=project_id, token=key_file).ls("/")
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
raise return_result
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
result[0] = await coro
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 842, in _ls
out = await self._list_buckets()
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 619, in _list_buckets
page = await self._call("GET", "b", project=self.project, json_out=True)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 429, in _call
status, headers, info, contents = await self._request(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/decorator.py", line 221, in fun
return await caller(func, *(extras + args), **kw)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 149, in retry_request
raise e
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
return await func(*args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 422, in _request
validate_response(status, contents, path, args)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 101, in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400
MRE with creds instace
import gcsfs
import google.auth
from google.oauth2 import service_account
project_id = "my_project_idg"
key_file = "/Users/my_user/creds.json" #these are service account creds
sa_creds = service_account.Credentials.from_service_account_file(key_file,
)
res = gcsfs.GCSFileSystem(project_id=project_id, token=sa_creds).ls("/")
Traceback
_request out of retries on exception: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
Traceback (most recent call last):
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
return await func(*args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 413, in _request
headers=self._get_headers(headers),
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 392, in _get_headers
self.credentials.apply(out)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 185, in apply
self.maybe_refresh()
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 180, in maybe_refresh
self.credentials.refresh(req)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/service_account.py", line 425, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 303, in jwt_grant
response_data = _token_endpoint_request(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 274, in _token_endpoint_request
_handle_error_response(response_data, retryable_error)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 73, in _handle_error_response
raise exceptions.RefreshError(
google.auth.exceptions.RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
Traceback (most recent call last):
File "/Users/ncclementi/Documents/dask-bigquery-demo/demo.py", line 19, in <module>
res = gcsfs.GCSFileSystem(project_id=project_id, token=sa_creds).ls("/")
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
raise return_result
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
result[0] = await coro
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 842, in _ls
out = await self._list_buckets()
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 619, in _list_buckets
page = await self._call("GET", "b", project=self.project, json_out=True)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 429, in _call
status, headers, info, contents = await self._request(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/decorator.py", line 221, in fun
return await caller(func, *(extras + args), **kw)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 144, in retry_request
raise e
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
return await func(*args, **kwargs)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 413, in _request
headers=self._get_headers(headers),
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 392, in _get_headers
self.credentials.apply(out)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 185, in apply
self.maybe_refresh()
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 180, in maybe_refresh
self.credentials.refresh(req)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/service_account.py", line 425, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 303, in jwt_grant
response_data = _token_endpoint_request(
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 274, in _token_endpoint_request
_handle_error_response(response_data, retryable_error)
File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 73, in _handle_error_response
raise exceptions.RefreshError(
google.auth.exceptions.RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
cc: @j-bennet since we were working on this together
Please state your versions of gcsfs, fsspec and relevant google libraries. It's also good to check whether any of the latter have had recent releases that might have fouled things.
Versions:
fsspec 2023.5.0 pyh1a96a4e_0 conda-forge
gcsfs 2023.5.0 pyhd8ed1ab_0 conda-forge
google-api-core 2.11.0 pyhd8ed1ab_0 conda-forge
google-api-core-grpc 2.11.0 hd8ed1ab_0 conda-forge
google-auth 2.18.0 pyh1a96a4e_0 conda-forge
google-auth-oauthlib 1.0.0 pyhd8ed1ab_0 conda-forge
google-cloud-bigquery 3.10.0 pyhd8ed1ab_0 conda-forge
google-cloud-bigquery-core 3.10.0 pyhd8ed1ab_0 conda-forge
google-cloud-bigquery-storage 2.18.0 pyh1a96a4e_0 conda-forge
google-cloud-bigquery-storage-core 2.18.0 pyh1a96a4e_0 conda-forge
google-cloud-core 2.3.2 pyhd8ed1ab_0 conda-forge
google-cloud-storage 2.9.0 pyh1a96a4e_0 conda-forge
google-crc32c 1.1.2 py39h17a57db_4 conda-forge
google-resumable-media 2.5.0 pyhd8ed1ab_0 conda-forge
googleapis-common-protos 1.57.1 pyhd8ed1ab_0 conda-forge
grpcio 1.54.2 py39hb198ff7_0 conda-forge
grpcio-status 1.52.0 pyhd8ed1ab_0 conda-forge
Looking at the API docs for GCSFileSystem
(https://gcsfs.readthedocs.io/en/latest/api.html#gcsfs.core.GCSFileSystem) it looks like the kwarg is project=
, not project_id=
. Do things work if you use project=
instead?
Nice catch, this solves the problem partially. With this fixed I see the following behavior:
GCSFileSystem does not accept token or instance of service account credentials.
- path to service account JSON file works ✅
import gcfs
project_id = "project-id"
creds_file_sa = "service_account.json"
gcsfs.GCSFileSystem(project=project_id, token=creds_file_sa).ls("/")
- path to application default credentials (ADC) works ✅
creds_file_adc = "application_default_credentials.json"
gcsfs.GCSFileSystem(project=project_id, token=creds_file_adc).ls("/")
- an instance of
google.oauth2.credentials.Credentials
works ✅
credentials_adc, _ = google.auth.default()
gcsfs.GCSFileSystem(project=project_id, token=credentials_adc).ls("/")
- an instance of
google.oauth2.service_account.Credentials
doesn't work ❌
import json
from google.oauth2.service_account import Credentials
creds_dict = json.load(open(creds_file_sa))
credentials_sa = Credentials.from_service_account_info(info=creds_dict)
gcsfs.GCSFileSystem(project=project_id, token=credentials_sa).ls("/")
It fails with:
RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
- a token from ADC instance doesn't work ❌
credentials_adc.refresh(google.auth.transport.requests.Request())
# after this, credentials_adc.token is a non-empty string
gcsfs.GCSFileSystem(project=project_id, token=credentials_adc.token).ls("/")
raises:
FileNotFoundError: ya29.a0AWY7CXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXX
For our use case, 5 would be nice, since we would not have to provision those files or pass around Credentials
objects (they can't be pickled). But it's weird that 4 doesn't work either.
@ncclementi Figured out what we were doing wrong with case 4. Have to provide scope
when creating SA instance.
- an instance of
google.oauth2.service_account.Credentials
works ✅
from google.oauth2 import service_account
credentials_sa = service_account.Credentials.from_service_account_file('service_account.json', scopes=["https://www.googleapis.com/auth/devstorage.read_write"])
gcsfs.GCSFileSystem(project=project_id, token=credentials_sa).ls("/")
Only two cases left that don't work:
- a token from ADC instance doesn't work ❌
- a token from service account instance doesn't work ❌
sorry @martindurant, we're still figuring the right way to do this. :) Should a token work?
I have just tested with a service account key JSON file path and successfully listed files in a restricted bucket. Are you doing something different? The service account was specifically listed as a reader on the bucket, rather than assigning with any role, but this shows that service accounts can indeed authenticate.
For the "invalid scope" error when using a Credentials instance, you need to set the scopes. This maybe should be done by gcsfs, but we are (apparently) assuming that the instance is already fully configured.
credentials_sa._scopes = ["https://www.googleapis.com/auth/devstorage.read_only"]
(docs suggest there should be a .createScoped method, but I don't see one)
@martindurant the original issue description is not correct anymore, but I can't edit that.
What is actually not working described in this comment:
a token from service account instance doesn't work
I don't know that me can make a raw token work, we need to know whether it is still valid and should be refreshed. You will need to make these into a Credentials I think.
I don't know that me can make a raw token work, we need to know whether it is still valid and should be refreshed. You will need to make these into a Credentials I think.
What about a json dict, should that work?
It will still try to refresh. I think what you might need, is to subclass gcsfs.credentials.GoogleCredentials, which needs to expose an apply(head: dict)
function, or make a PR for the existing class to accept a non-refreshable raw token, presumably in _connect_token().
(a dict would be mapped to a service account or principle token just as a JSON file would, in _dict_to_credentials)
@martindurant Thank you, I think we can close this issue.
If you find a path that works for you to pass tokens in directly, please contribute a PR, as this can be useful to others too.