
Evaluation on ANLI fails with `ValueError`

fepegar opened this issue · 3 comments

Hi and thanks for open-sourcing this!

I'm trying to run the evaluation script for the ANLI dataset with the following command:

python \
    --dataset_name anli \
    --template_name "should assume" \
    --model_name_or_path bigscience/T0pp \
    --output_dir ./debug \

I get the following error:

Traceback (most recent call last):
  File "radt5-dev/", line 389, in <module>
  File "radt5-dev/", line 385, in main
  File "radt5-dev/", line 289, in run
    preprocess_function, batched=True, remove_columns=column_names
  File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/", line 782, in map
    for k, dataset in self.items()
  File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/", line 782, in <dictcomp>
    for k, dataset in self.items()
  File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/", line 2329, in map
    f"Column to remove {list(filter(lambda col: col not in self._data.column_names, remove_columns))} not in the dataset. Current columns in the dataset: {self._data.column_names}"
ValueError: Column to remove ['train_r1', 'dev_r1', 'test_r1', 'train_r2', 'dev_r2', 'test_r2', 'train_r3', 'dev_r3', 'test_r3'] not in the dataset. Current columns in the dataset: ['uid', 'premise', 'hypothesis', 'label', 'reason']

Is there a known fix for this?

hi @fepegar, could you try with specifiy the dataset_config_name
Capture d’écran 2022-07-28 à 10 31 47 AM

My bad for not reading the README properly. But maybe the script could handle the issue more gracefully. I'll close this and reopen if it doesn't work. Thanks for your reply, @VictorSanh.

you are right, I put sanity checks: 7a699e3