Some datasets are not available
koshyviv opened this issue · 5 comments
I'm not sure if i'm missing something, but I tried loading the MMLU, SQuADv2 - but it always gives me an error (the json file is empty)
It does seem to fail to fetch from the URL provided here, since the data
folder does not exist
# check if the dataset exists, if not, download it
self.filepath = os.path.join(self.data_dir, f"{dataset_name}.json")
self.filepath2 = os.path.join(self.data_dir, f"{dataset_name}.jsonl")
if not os.path.exists(self.filepath):
if os.path.exists(self.filepath2):
self.filepath = self.filepath2
else:
url = f'https://wjdcloud.blob.core.windows.net/dataset/promptbench/dataset/{dataset_name}.json'
print(f"Downloading {dataset_name} dataset...")
response = requests.get(url)
with open(self.filepath, 'wb') as f:
f.write(response.content)
Thanks for highlighting this issue. We have fixed this in the latest version. please use the following command to clone the repository: git clone git@github.com:microsoft/promptbench.git
. And then, you can use dataset = pb.DatasetLoader.load_dataset("mmlu")
to download it from HuggingFace.
Im still facing the same issue, cloned the latest commit 1958b662cc8a866119f350779de39e5d6203b660
I'm sorry, i just cloned afresh and tried again - its working now, thanks for the quick help!
Delighted to hear it's resolved! Please don't hesitate to reach out for any further issues.