fsspec/gcsfs

GCSFileSystem does not accept token, file.json or instance of service credentials

ncclementi opened this issue · 13 comments

According to this docs https://gcsfs.readthedocs.io/en/latest/index.html#credentials we should be able to pass a .json file with credentials or an instance of Credentials but this is currently not working

MRE - json file

import gcsfs
import google.auth
from google.oauth2 import service_account

project_id = "my_project_idg"
key_file = "/Users/my_user/creds.json" #these are service account creds
res = gcsfs.GCSFileSystem(project_id=project_id, token=key_file).ls("/")
Traceback
_request non-retriable exception: Invalid argument., 400
Traceback (most recent call last):
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 422, in _request
    validate_response(status, contents, path, args)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 101, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400
Traceback (most recent call last):
  File "/Users/ncclementi/Documents/dask-bigquery-demo/demo.py", line 19, in <module>
    res = gcsfs.GCSFileSystem(project_id=project_id, token=key_file).ls("/")
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
    raise return_result
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
    result[0] = await coro
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 842, in _ls
    out = await self._list_buckets()
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 619, in _list_buckets
    page = await self._call("GET", "b", project=self.project, json_out=True)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 429, in _call
    status, headers, info, contents = await self._request(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 149, in retry_request
    raise e
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 422, in _request
    validate_response(status, contents, path, args)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 101, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400

MRE with creds instace

import gcsfs
import google.auth
from google.oauth2 import service_account

project_id = "my_project_idg"
key_file = "/Users/my_user/creds.json" #these are service account creds

sa_creds = service_account.Credentials.from_service_account_file(key_file,
                                                           )
res = gcsfs.GCSFileSystem(project_id=project_id, token=sa_creds).ls("/")
Traceback
_request out of retries on exception: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
Traceback (most recent call last):
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 413, in _request
    headers=self._get_headers(headers),
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 392, in _get_headers
    self.credentials.apply(out)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 185, in apply
    self.maybe_refresh()
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 180, in maybe_refresh
    self.credentials.refresh(req)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/service_account.py", line 425, in refresh
    access_token, expiry, _ = _client.jwt_grant(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 303, in jwt_grant
    response_data = _token_endpoint_request(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 274, in _token_endpoint_request
    _handle_error_response(response_data, retryable_error)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 73, in _handle_error_response
    raise exceptions.RefreshError(
google.auth.exceptions.RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
Traceback (most recent call last):
  File "/Users/ncclementi/Documents/dask-bigquery-demo/demo.py", line 19, in <module>
    res = gcsfs.GCSFileSystem(project_id=project_id, token=sa_creds).ls("/")
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
    raise return_result
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
    result[0] = await coro
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 842, in _ls
    out = await self._list_buckets()
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 619, in _list_buckets
    page = await self._call("GET", "b", project=self.project, json_out=True)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 429, in _call
    status, headers, info, contents = await self._request(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 144, in retry_request
    raise e
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/retry.py", line 114, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 413, in _request
    headers=self._get_headers(headers),
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/core.py", line 392, in _get_headers
    self.credentials.apply(out)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 185, in apply
    self.maybe_refresh()
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/gcsfs/credentials.py", line 180, in maybe_refresh
    self.credentials.refresh(req)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/service_account.py", line 425, in refresh
    access_token, expiry, _ = _client.jwt_grant(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 303, in jwt_grant
    response_data = _token_endpoint_request(
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 274, in _token_endpoint_request
    _handle_error_response(response_data, retryable_error)
  File "/Users/ncclementi/mambaforge/envs/test-dask-bq-pip/lib/python3.9/site-packages/google/oauth2/_client.py", line 73, in _handle_error_response
    raise exceptions.RefreshError(
google.auth.exceptions.RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})

cc: @j-bennet since we were working on this together

Please state your versions of gcsfs, fsspec and relevant google libraries. It's also good to check whether any of the latter have had recent releases that might have fouled things.

Versions:

fsspec                    2023.5.0           pyh1a96a4e_0    conda-forge
gcsfs                     2023.5.0           pyhd8ed1ab_0    conda-forge
google-api-core           2.11.0             pyhd8ed1ab_0    conda-forge
google-api-core-grpc      2.11.0               hd8ed1ab_0    conda-forge
google-auth               2.18.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      1.0.0              pyhd8ed1ab_0    conda-forge
google-cloud-bigquery     3.10.0             pyhd8ed1ab_0    conda-forge
google-cloud-bigquery-core 3.10.0             pyhd8ed1ab_0    conda-forge
google-cloud-bigquery-storage 2.18.0             pyh1a96a4e_0    conda-forge
google-cloud-bigquery-storage-core 2.18.0             pyh1a96a4e_0    conda-forge
google-cloud-core         2.3.2              pyhd8ed1ab_0    conda-forge
google-cloud-storage      2.9.0              pyh1a96a4e_0    conda-forge
google-crc32c             1.1.2            py39h17a57db_4    conda-forge
google-resumable-media    2.5.0              pyhd8ed1ab_0    conda-forge
googleapis-common-protos  1.57.1             pyhd8ed1ab_0    conda-forge
grpcio                    1.54.2           py39hb198ff7_0    conda-forge
grpcio-status             1.52.0             pyhd8ed1ab_0    conda-forge

Looking at the API docs for GCSFileSystem (https://gcsfs.readthedocs.io/en/latest/api.html#gcsfs.core.GCSFileSystem) it looks like the kwarg is project=, not project_id=. Do things work if you use project= instead?

@jrbourbeau

Nice catch, this solves the problem partially. With this fixed I see the following behavior:

GCSFileSystem does not accept token or instance of service account credentials.

  1. path to service account JSON file works ✅
import gcfs

project_id = "project-id"
creds_file_sa = "service_account.json"
gcsfs.GCSFileSystem(project=project_id, token=creds_file_sa).ls("/")
  1. path to application default credentials (ADC) works ✅
creds_file_adc = "application_default_credentials.json"
gcsfs.GCSFileSystem(project=project_id, token=creds_file_adc).ls("/")
  1. an instance of google.oauth2.credentials.Credentials works ✅
credentials_adc, _ = google.auth.default()
gcsfs.GCSFileSystem(project=project_id, token=credentials_adc).ls("/")
  1. an instance of google.oauth2.service_account.Credentials doesn't work ❌
import json
from google.oauth2.service_account import Credentials

creds_dict = json.load(open(creds_file_sa))
credentials_sa = Credentials.from_service_account_info(info=creds_dict)
gcsfs.GCSFileSystem(project=project_id, token=credentials_sa).ls("/")

It fails with:

RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})
  1. a token from ADC instance doesn't work ❌
credentials_adc.refresh(google.auth.transport.requests.Request())
# after this, credentials_adc.token is a non-empty string
gcsfs.GCSFileSystem(project=project_id, token=credentials_adc.token).ls("/")

raises:

FileNotFoundError: ya29.a0AWY7CXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXX

For our use case, 5 would be nice, since we would not have to provision those files or pass around Credentials objects (they can't be pickled). But it's weird that 4 doesn't work either.

@ncclementi Figured out what we were doing wrong with case 4. Have to provide scope when creating SA instance.

  1. an instance of google.oauth2.service_account.Credentials works ✅
from google.oauth2 import service_account

credentials_sa = service_account.Credentials.from_service_account_file('service_account.json', scopes=["https://www.googleapis.com/auth/devstorage.read_write"])
gcsfs.GCSFileSystem(project=project_id, token=credentials_sa).ls("/")

Only two cases left that don't work:

  1. a token from ADC instance doesn't work ❌
  2. a token from service account instance doesn't work ❌

sorry @martindurant, we're still figuring the right way to do this. :) Should a token work?

I have just tested with a service account key JSON file path and successfully listed files in a restricted bucket. Are you doing something different? The service account was specifically listed as a reader on the bucket, rather than assigning with any role, but this shows that service accounts can indeed authenticate.

For the "invalid scope" error when using a Credentials instance, you need to set the scopes. This maybe should be done by gcsfs, but we are (apparently) assuming that the instance is already fully configured.

credentials_sa._scopes = ["https://www.googleapis.com/auth/devstorage.read_only"]

(docs suggest there should be a .createScoped method, but I don't see one)

@martindurant the original issue description is not correct anymore, but I can't edit that.

What is actually not working described in this comment:

#553 (comment)

a token from service account instance doesn't work

I don't know that me can make a raw token work, we need to know whether it is still valid and should be refreshed. You will need to make these into a Credentials I think.

I don't know that me can make a raw token work, we need to know whether it is still valid and should be refreshed. You will need to make these into a Credentials I think.

What about a json dict, should that work?

It will still try to refresh. I think what you might need, is to subclass gcsfs.credentials.GoogleCredentials, which needs to expose an apply(head: dict) function, or make a PR for the existing class to accept a non-refreshable raw token, presumably in _connect_token().

(a dict would be mapped to a service account or principle token just as a JSON file would, in _dict_to_credentials)

@martindurant Thank you, I think we can close this issue.

If you find a path that works for you to pass tokens in directly, please contribute a PR, as this can be useful to others too.