google-research/text-to-text-transfer-transformer

Error permission denied model in GCS

giappham opened this issue · 11 comments

Google just updated the authentication method on April 1st, leading to an error in accessing the model on the cloud. How to fix?

InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

From /job:worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://test_model/model.ckpt-524288: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"code": 403,
"message": "service-495559152420@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket.",
"errors": [
{
"message": "service-495559152420@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket.",
"domain": "global",
"reason": "forbidden"
}
]
}
}
'

Hello,

In fact, the authentication error in the line tensorflow_gcs_config.configure_gcs_from_colab_auth() generates the following message:
error_tpu_colab

Please let us know if you have any suggestion to fix the issue.

Thank you in advance.

I no longer see the abc.json file when google upgrade colab. Because before google authenticated via key, now just click allow.

you can click Menu in GCS --> IAM-Admin --> add role Storage Admin for Principal

Hello,

Thank you for your answer. Unfortunately, I still have same the issue. I am running the following code in google colab:

print("Installing dependencies...")
%tensorflow_version 2.x
#!pip install -q tensorflow==2.8
#!pip install -q tensorflow-gcs-config==2.8
!pip install -q t5

import functools
import os
import time
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import tensorflow.compat.v1 as tf
import tensorflow_datasets as tfds
import gin
import t5
ON_CLOUD = True

if ON_CLOUD:
  print("Setting up GCS access...")
  import tensorflow_gcs_config
  from google.colab import auth
  # Set credentials for GCS reading/writing from Colab and TPU.
  TPU_TOPOLOGY = "v3-8"  # v3-8
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    TPU_ADDRESS = tpu.get_master()
    print('Running on TPU:', TPU_ADDRESS)
  except ValueError:
    raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
  auth.authenticate_user()
  tf.config.experimental_connect_to_host(TPU_ADDRESS)
  tensorflow_gcs_config.configure_gcs_from_colab_auth()

tf.disable_v2_behavior()

# Improve logging.
from contextlib import contextmanager
import logging as py_logging

if ON_CLOUD:
  tf.get_logger().propagate = False
  py_logging.root.setLevel('INFO')

@contextmanager
def tf_verbosity_level(level):
  og_level = tf.logging.get_verbosity()
  tf.logging.set_verbosity(level)
  yield
  tf.logging.set_verbosity(og_level)

Thank you in advance.

Hello, my code it's working.

From the code I commented those lines:

tf.config.experimental_connect_to_host(TPU_ADDRESS)
tensorflow_gcs_config.configure_gcs_from_colab_auth()

Later, each bucket has some persmissions as the image shows:

sol_1

So, I add the permissions are indicated in the following image:

sol_2

The most important it's that I had to make public my bucket. Otherwise, it did not work.

you can click Menu in GCS --> IAM-Admin --> add role Storage Admin for Principal

Let me clarify this. We have to grant the appropriate storage role for TPU service account: service-495559152420@cloud-tpu.iam.gserviceaccount.com. For our case, just storage admin solved the problem.

image_2022_04_12T03_50_52_489Z

I have made my bucker public by making allUsers role of storage admin just like @JessicaLopezEspejel 's screenshot shows but still it gives me error of FileNotFoundError: [Errno 2] No such file or directory: '/content/adc.json'.
Does anyone know why it might be the case? Thanks.

@mshen2 can you try this code? please

%tensorflow_version 2.x
import tensorflow.compat.v1 as tf
import tensorflow_datasets as tfds
import gin
ON_CLOUD = True

if ON_CLOUD:
  print("Setting up GCS access...")
  import tensorflow_gcs_config
  from google.colab import auth
  # Set credentials for GCS reading/writing from Colab and TPU.
  TPU_TOPOLOGY = "v3-8"  # v3-8
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    TPU_ADDRESS = tpu.get_master()
    print('Running on TPU:', TPU_ADDRESS)
  except ValueError:
    raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
  auth.authenticate_user()
  #tf.config.experimental_connect_to_host(TPU_ADDRESS)
  #tensorflow_gcs_config.configure_gcs_from_colab_auth()

tf.disable_v2_behavior()

# Improve logging.
from contextlib import contextmanager
import logging as py_logging

if ON_CLOUD:
  tf.get_logger().propagate = False
  py_logging.root.setLevel('INFO')

@contextmanager
def tf_verbosity_level(level):
  og_level = tf.logging.get_verbosity()
  tf.logging.set_verbosity(level)
  yield
  tf.logging.set_verbosity(og_level)

Normally, if you modified the storage role for TPU service, it will work correctly. It's a code from T5, it is the one I am using and it is working well.

I'm told the solution is to authenticate with a service account instead due to https://developers.googleblog.com/2022/02/making-oauth-flows-safer.html#disallowed-oo

Can someone please try auth.authenticate_service_account instead of auth.authenticate_user to verify it works. You can create the requested key with http://cloud/iam/docs/creating-managing-service-account-keys#creating.

@JessicaLopezEspejel Thank you, commenting out those two lines do work, and I can indeed write and read from GCS on colab. I am not entirely sure what these two lines do and whether they are necessary for later, but now the code it's working fine at this point.
@adarob Same error happens when using auth.authenticate_service_account(), and the link is down actually.

Hello everyone, probably I'm a bit late on this issue, but for me, adding the command below before setting up GCS solved the problem.

os.environ['USE_AUTH_EPHEM'] = '0'