microsoft/promptbench

Some datasets are not available

koshyviv opened this issue · 5 comments

I'm not sure if i'm missing something, but I tried loading the MMLU, SQuADv2 - but it always gives me an error (the json file is empty)

It does seem to fail to fetch from the URL provided here, since the data folder does not exist

        # check if the dataset exists, if not, download it
        self.filepath = os.path.join(self.data_dir, f"{dataset_name}.json")
        self.filepath2 = os.path.join(self.data_dir, f"{dataset_name}.jsonl")
        if not os.path.exists(self.filepath):
            if os.path.exists(self.filepath2):
                self.filepath = self.filepath2
            else:
                url = f'https://wjdcloud.blob.core.windows.net/dataset/promptbench/dataset/{dataset_name}.json'
                print(f"Downloading {dataset_name} dataset...")
                response = requests.get(url)
                with open(self.filepath, 'wb') as f:
                    f.write(response.content)

Thanks for highlighting this issue. We have fixed this in the latest version. please use the following command to clone the repository: git clone git@github.com:microsoft/promptbench.git. And then, you can use dataset = pb.DatasetLoader.load_dataset("mmlu") to download it from HuggingFace.

Im still facing the same issue, cloned the latest commit 1958b662cc8a866119f350779de39e5d6203b660

Could you please give the detailed error messages and a MRE? Thanks.
image

I'm sorry, i just cloned afresh and tried again - its working now, thanks for the quick help!

Delighted to hear it's resolved! Please don't hesitate to reach out for any further issues.