FYI, HumanEval 95 check dict case canonical solution is wrong
Closed this issue · 1 comments
PootieT commented
arjunguha commented
Oh, there is more than just this one... lookup the degrees/radians problem in MBPP--sorry I forget the number. We actually don't use the canonical solutions at all in MultiPL-E. So, we should be okay about this.
The principle we've been following: we want to fix bugs in MultiPL-E, but preserve bugs in the underlying benchmarks. Hopefully, that will make comparisons easier to do. Let me know if you have other ideas.