stanford-crfm/helm

Add MMLU-Pro

yifanmai opened this issue · 0 comments

https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro

Should be similar to original MMLU: see mmlu_scenario.py for the original MMLU and air_bench_scenario.py for how to use load_dataset() with Hugging Face datasets.

Edit: Also look at simple_scenarios.py and test_simple_scenarios.py for an example of MCQA.

Edit 2: Also see this doc.

Edit 3: To create the run spec function, take this function in lite_run_specs.py:

@run_spec_function("mmlu")
def get_mmlu_spec(subject: str, method: str = ADAPT_MULTIPLE_CHOICE_JOINT) -> RunSpec:

and modify it so mmlu becomes mmlu-pro, then you should be able to do helm-run.