GanjinZero/math401-llm

Duplicate cases in the test set

yanyc428 opened this issue · 1 comments

we found duplicate cases in the test set

e.g.
line 253 {"query": "3^3=", "response": "27"}
line 271 {"query": "3^3=", "response": "27"}

maybe deduplication will improve the test set

Yes, we find this problem after our first arxiv publication. But fixing this problem led to rerun all experiments which is a big cost and very messy for analysis.