OpenGVLab/OmniQuant

Error when evaluating MMLU

Opened this issue · 2 comments

I have added --tasks hendrycksTest in my command, but gotten this error:*

Selected Tasks: ['hendrycksTest-college_medicine', 'hendrycksTest-high_school_macroeconomics', 'hendrycksTest-security_studies', 'hendrycksTest-computer_security', 'hendrycksTest-philosophy', 'hendrycksTest-moral_disputes', 'hendrycksTest-high_school_computer_science', 'hendrycksTest-virology', 'hendrycksTest-college_biology', 'hendrycksTest-business_ethics', 'hendrycksTest-college_computer_science', 'hendrycksTest-college_mathematics', 'hendrycksTest-electrical_engineering', 'hendrycksTest-high_school_government_and_politics', 'hendrycksTest-human_sexuality', 'hendrycksTest-conceptual_physics', 'hendrycksTest-us_foreign_policy', 'hendrycksTest-high_school_world_history', 'hendrycksTest-professional_medicine', 'hendrycksTest-jurisprudence', 'hendrycksTest-machine_learning', 'hendrycksTest-miscellaneous', 'hendrycksTest-college_physics', 'hendrycksTest-medical_genetics', 'hendrycksTest-college_chemistry', 'hendrycksTest-high_school_psychology', 'hendrycksTest-elementary_mathematics', 'hendrycksTest-anatomy', 'hendrycksTest-astronomy', 'hendrycksTest-international_law', 'hendrycksTest-human_aging', 'hendrycksTest-moral_scenarios', 'hendrycksTest-professional_psychology', 'hendrycksTest-world_religions', 'hendrycksTest-high_school_european_history', 'hendrycksTest-marketing', 'hendrycksTest-prehistory', 'hendrycksTest-formal_logic', 'hendrycksTest-logical_fallacies', 'hendrycksTest-professional_accounting', 'hendrycksTest-abstract_algebra', 'hendrycksTest-high_school_physics', 'hendrycksTest-high_school_geography', 'hendrycksTest-management', 'hendrycksTest-nutrition', 'hendrycksTest-clinical_knowledge', 'hendrycksTest-high_school_mathematics', 'hendrycksTest-global_facts', 'hendrycksTest-high_school_microeconomics', 'hendrycksTest-professional_law', 'hendrycksTest-econometrics', 'hendrycksTest-sociology', 'hendrycksTest-high_school_us_history', 'hendrycksTest-high_school_biology', 'hendrycksTest-high_school_chemistry', 'hendrycksTest-high_school_statistics', 'hendrycksTest-public_relations']
/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/load.py:1461: FutureWarning: The repository for hendrycks_test contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hendrycks_test
You can avoid this message in future by passing the argument trust_remote_code=True.
Passing trust_remote_code=True will be mandatory to load this dataset from the next major release of datasets.
warnings.warn(
Traceback (most recent call last):
File "/root/zeroshot/eval_zero_shot.py", line 392, in
main()
File "/root/zeroshot/eval_zero_shot.py", line 388, in main
evaluate(lm, args,logger)
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/eval_zero_shot.py", line 183, in evaluate
t_results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/utils.py", line 185, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/evaluator.py", line 66, in simple_evaluate
task_dict = lm_eval.tasks.get_task_dict(task_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/tasks/init.py", line 342, in get_task_dict
task_name_dict = {
^
File "/root/zeroshot/lm_eval/tasks/init.py", line 343, in
task_name: get_task(task_name)()
^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/tasks/hendrycks_test.py", line 100, in init
super().init(subject)
File "/root/zeroshot/lm_eval/tasks/hendrycks_test.py", line 112, in init
super().init()
File "/root/zeroshot/lm_eval/base.py", line 412, in init
self.download(data_dir, cache_dir, download_mode)
File "/root/zeroshot/lm_eval/base.py", line 441, in download
self.dataset = datasets.load_dataset(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/load.py", line 2582, in load_dataset
builder_instance.download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1767, in _download_and_prepare
super()._download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1565, in _prepare_split
split_info = self.info.splits[split_generator.name]
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/splits.py", line 532, in getitem
instructions = make_file_instructions(
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/arrow_reader.py", line 115, in make_file_instructions
raise TypeError(f"Expected str 'name', but got: {type(name).name}")
TypeError: Expected str 'name', but got: NoneType

I also meet this issue.

It is because mmlu in huggingface is renamed to cais/mmlu. You need to change the datapath at

DATASET_PATH = "hendrycks_test"

Even though I can run the code, I can't reproduce the reported results, even for the FP16 one. Not sure whether cais/mmlu is exactly the same as hendrycks_test