Evaluation on ANLI fails with `ValueError`
fepegar opened this issue · 3 comments
fepegar commented
Hi and thanks for open-sourcing this!
I'm trying to run the evaluation script for the ANLI dataset with the following command:
python run_eval.py \
--dataset_name anli \
--template_name "should assume" \
--model_name_or_path bigscience/T0pp \
--output_dir ./debug \
--parallelize
I get the following error:
Traceback (most recent call last):
File "radt5-dev/evaluate_t0_big_science.py", line 389, in <module>
main()
File "radt5-dev/evaluate_t0_big_science.py", line 385, in main
run(args)
File "radt5-dev/evaluate_t0_big_science.py", line 289, in run
preprocess_function, batched=True, remove_columns=column_names
File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/dataset_dict.py", line 782, in map
for k, dataset in self.items()
File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/dataset_dict.py", line 782, in <dictcomp>
for k, dataset in self.items()
File "/azureml-envs/azureml_030d8ea9f3a01ad1b81e1990cbdde727/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 2329, in map
f"Column to remove {list(filter(lambda col: col not in self._data.column_names, remove_columns))} not in the dataset. Current columns in the dataset: {self._data.column_names}"
ValueError: Column to remove ['train_r1', 'dev_r1', 'test_r1', 'train_r2', 'dev_r2', 'test_r2', 'train_r3', 'dev_r3', 'test_r3'] not in the dataset. Current columns in the dataset: ['uid', 'premise', 'hypothesis', 'label', 'reason']
Is there a known fix for this?
VictorSanh commented
hi @fepegar, could you try with specifiy the dataset_config_name
?
fepegar commented
My bad for not reading the README properly. But maybe the script could handle the issue more gracefully. I'll close this and reopen if it doesn't work. Thanks for your reply, @VictorSanh.
VictorSanh commented
you are right, I put sanity checks: 7a699e3