Error on MTOPDomainClassification task
ZiyiXia opened this issue · 5 comments
When running evaluation on MTEB English benchmark, got following error during the MTOPDomainClassification task:
ERROR:mteb.evaluation.MTEB:Error while evaluating MTOPDomainClassification: Consistency check failed: file should be of size 2191 but has size 2190 ((…)62165c59d59d0034df9fff0bf/mtop_domain.py).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Traceback (most recent call last):
File "/share/project/xzy/test/mteb_eval.py", line 56, in <module>
evaluation.run(
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/mteb/evaluation/MTEB.py", line 422, in run
raise e
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/mteb/evaluation/MTEB.py", line 352, in run
task.load_data(eval_splits=task_eval_splits, **kwargs)
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/mteb/abstasks/MultiSubsetLoader.py", line 15, in load_data
self.slow_load()
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/mteb/abstasks/MultiSubsetLoader.py", line 44, in slow_load
self.dataset[lang] = datasets.load_dataset(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 1896, in dataset_module_factory
).get_module()
^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 1507, in get_module
local_path = self.download_loading_script()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/load.py", line 1467, in download_loading_script
return cached_path(file_path, download_config=download_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 211, in cached_path
output_path = get_from_cache(
^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 689, in get_from_cache
fsspec_get(
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 395, in fsspec_get
fs.get_file(path, temp_file.name, callback=callback)
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/huggingface_hub/hf_file_system.py", line 648, in get_file
http_get(
File "/root/anaconda3/envs/faiss/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 578, in http_get
raise EnvironmentError(
OSError: Consistency check failed: file should be of size 2191 but has size 2190 ((…)62165c59d59d0034df9fff0bf/mtop_domain.py).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Try to download through HF datasets directly but got the same error as above.
from datasets import load_dataset
data = load_dataset("mteb/mtop_domain", "en", force_download=True)
Any idea how to get over it? Appreciate your help
Thanks for reporting this. I can't reproduce the code locally:
data = load_dataset("mteb/mtop_domain", "en", trust_remote_code=True)
# runs without issues
but I do get an error from using the force_download flag which makes me believe that you are using another version of datasets. If you let me know which one I will ensure that we specify it is requirements
I am using the version:
import datasets
datasets.__version__
# 2.21.0
Thanks for your response. I just checked my datasets version which is also 2.21.0.
Could that be an error during the downloading process? I haven't download the dataset in my environment but you might have it already downloaded? So just load the dataset locally won't run into error.
I will also open an issue at datasets repo to see if people there have any idea what's going on
Tried in a colab notebook, could not reproduce it:
https://colab.research.google.com/drive/1U6_tvysJdH-hiWUEMXN6p9COJoZi8CjG?usp=sharing
It might be worth resetting the Huggingface cache.
Just solved this by reinstall Huggingface Hub and datasets. Thanks for your help!
Wonderful great to hear!