FYI, HumanEval 95 check dict case canonical solution is wrong

Question

FYI, HumanEval 95 check dict case canonical solution is wrong

Closed this issue a year ago · 1 comments

Answer 1 · 2023-04-19T20:33:08.000Z

Oh, there is more than just this one... lookup the degrees/radians problem in MBPP--sorry I forget the number. We actually don't use the canonical solutions at all in MultiPL-E. So, we should be okay about this.

The principle we've been following: we want to fix bugs in MultiPL-E, but preserve bugs in the underlying benchmarks. Hopefully, that will make comparisons easier to do. Let me know if you have other ideas.