google-research/text-to-text-transfer-transformer

Can't download GLUE RTE dataset

shunyuzh opened this issue · 1 comments

t5_mesh_transformer  \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${DATA_DIR}" \
  --gin_file="dataset.gin" \
  --gin_param="utils.run.mesh_shape = 'model:1,batch:1'" \
  --gin_param="utils.run.mesh_devices = ['gpu:0']" \
  --gin_param="MIXTURE_NAME = 'glue_rte_v002'" \
  --gin_file="./t5_data/small/operative_config.gin"

Using above script, I can't download RTE task dataset. However, I can download MRPC dataset by replace 'glue_rte_v002' with 'glue_mrpc_v002'.

Generating dataset glue (/home/shunyu/container/Project/t5_data/glue/glue/rte/1.0.0)
Downloading and preparing dataset glue/rte/1.0.0 (download: 680.81 KiB, generated: Unknown size, total: 680.81 KiB) to /home/shunyu/container/Project/t5_data/glue/glue/rte/1.0.0...
Dl Completed...: 0 url [00:00, ? url/s]          I0810 06:04:16.266083 139915764279104 download_manager.py:476] Downloading https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FRTE.zip?alt=media&token=5efa7e85-a0bb-4f19-8ea2-9e1840f077fb into /home/shunyu/container/Project/t5_data/glue/downloads/fire.goog.com_v0_b_mtl-sent-repr.apps.6LYu5E5vi2rqdhk1koV5_-GqVdFhgIxILgclq73PnGQ.zipalt=media&token=5efa7e85-a0bb-4f19-8ea2-9e1840f077fb.tmp.e6859f06c60544f7a5e3e3b8972b64ea...
Extraction completed...: 0 file [00:00, ? file/s]                                                                                                                                     | 0/1 [00:00<?, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...:   0%|                                                                                                                                                                | 0/1 [00:00<?, ? url/s]
INFO:tensorflow:training_loop marked as finished
I0810 06:04:16.657165 139915764279104 error_handling.py:115] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0810 06:04:16.657362 139915764279104 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "/anaconda/envs/t5/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/home/shunyu/container/Project/text-to-text-transfer-transformer/t5/models/mesh_transformer_main.py", line 283, in console_entry_point
    app.run(main)
  File "/home/shunyu/.local/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/home/shunyu/.local/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/home/shunyu/container/Project/text-to-text-transfer-transformer/t5/models/mesh_transformer_main.py", line 272, in main
    utils.run(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/mesh_tensorflow/transformer/utils.py", line 2598, in run
    train_model_fn(estimator, vocabulary, sequence_length, batch_size,
  File "/anaconda/envs/t5/lib/python3.8/site-packages/mesh_tensorflow/transformer/utils.py", line 1815, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps, hooks=hooks)
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3110, in train
    rendezvous.raise_errors()
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3100, in train
    return super(TPUEstimator, self).train(
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1201, in _train_model_default
    self._get_features_and_labels_from_input_fn(input_fn, ModeKeys.TRAIN))
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1037, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3062, in _call_input_fn
    return input_fn(**kwargs)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/mesh_tensorflow/transformer/utils.py", line 1792, in input_fn
    dataset = train_dataset_fn(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/anaconda/envs/t5/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/shunyu/container/Project/text-to-text-transfer-transformer/t5/models/mesh_transformer.py", line 77, in mesh_train_dataset_fn
    ds = mixture_or_task.get_dataset(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/seqio/dataset_providers.py", line 1041, in get_dataset
    ds = source.get_dataset(split=split, shuffle=shuffle, seed=seed)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/seqio/dataset_providers.py", line 371, in get_dataset
    return self.tfds_dataset.load(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/seqio/utils.py", line 130, in load
    return tfds.load(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/load.py", line 346, in load
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 385, in download_and_prepare
    self._download_and_prepare(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1022, in _download_and_prepare
    super(GeneratorBasedBuilder, self)._download_and_prepare(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 961, in _download_and_prepare
    for split_generator in self._split_generators(
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/text/glue.py", line 448, in _split_generators
    dl_dir = dl_manager.download_and_extract(self.builder_config.data_url)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/download_manager.py", line 603, in download_and_extract
    return _map_promise(self._download_extract, url_or_urls)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/download_manager.py", line 636, in _map_promise
    res = tf.nest.map_structure(lambda p: p.get(), all_promises)  # Wait promises
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 867, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/home/shunyu/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 867, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/download_manager.py", line 636, in <lambda>
    res = tf.nest.map_structure(lambda p: p.get(), all_promises)  # Wait promises
  File "/anaconda/envs/t5/lib/python3.8/site-packages/promise/promise.py", line 512, in get
    return self._target_settled_value(_raise=True)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/promise/promise.py", line 516, in _target_settled_value
    return self._target()._settled_value(_raise)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/promise/promise.py", line 226, in _settled_value
    reraise(type(raise_val), raise_val, self._traceback)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/anaconda/envs/t5/lib/python3.8/site-packages/promise/promise.py", line 844, in handle_future_result
    resolve(future.result())
  File "/anaconda/envs/t5/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/anaconda/envs/t5/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/anaconda/envs/t5/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/downloader.py", line 184, in _sync_download
    with _open_url(url) as (response, iter_content):
  File "/anaconda/envs/t5/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/downloader.py", line 231, in _open_with_requests
    _assert_status(response)
  File "/anaconda/envs/t5/lib/python3.8/site-packages/tensorflow_datasets/core/download/downloader.py", line 258, in _assert_status
    raise DownloadError('Failed to get url {}. HTTP code: {}.'.format(
tensorflow_datasets.core.download.downloader.DownloadError: Failed to get url https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FRTE.zip?alt=media&token=5efa7e85-a0bb-4f19-8ea2-9e1840f077fb. HTTP code: 403.
  In call to configurable 'mesh_train_dataset_fn' (<function mesh_train_dataset_fn at 0x7f3fd8aeb790>)
  In call to configurable 'run' (<function run at 0x7f3fd8b630d0>)

Who can help?

Hi, we use TensorFlow Datasets to download and prepare datasets, and that's where this error is occurring (looks like it can't access the URL for downloading, not sure why). You should open an issue on https://github.com/tensorflow/datasets