TIGER-AI-Lab/MMLU-Pro

Duplicates in test split

Closed this issue · 1 comments

Hello, can you help me? There are 159 questions with duplicates in the test part. Here is the code to find duplicates:

from collections import defaultdict
import datasets

test = datasets.load_dataset("TIGER-Lab/MMLU-Pro", split="test")

mapping = defaultdict(int)

for item in test:
    mapping[(item["category"], item["question"], "".join(item["options"]), item["answer"])] += 1

count_doubles = 0
for (category, question, *_), count in mapping.items():
    if count > 1:
        print(category, repr(question))
        count_doubles += 1
print(count_doubles)

Thank you for pointing out these duplicates. These duplicate data will have minimal impact on the evaluation results, so we have decided not to remove them for the time being to maintain consistency in data quantity.