nuprl/MultiPL-E

FYI, HumanEval 95 check dict case canonical solution is wrong

Closed this issue · 1 comments

Oh, there is more than just this one... lookup the degrees/radians problem in MBPP--sorry I forget the number. We actually don't use the canonical solutions at all in MultiPL-E. So, we should be okay about this.

The principle we've been following: we want to fix bugs in MultiPL-E, but preserve bugs in the underlying benchmarks. Hopefully, that will make comparisons easier to do. Let me know if you have other ideas.