nuprl/MultiPL-E

Task HumanEval/092 has contradictory tests in Rust

Opened this issue · 5 comments

The Rust version of HumanEval/092 contains the following lines:

assert_eq!(candidate(3.0, 4.0, 7.0), true);
assert_eq!(candidate(3.0, 4.0, 7.0), false);

(I think this is row 67 of the huggingface dataset for multipl-E, but I haven't checked)

This obviously makes the tests unsatisfiable. It seems like this was a type-casting issue when translating from Python, the original tests read:

assert candidate(3,4,7)==True, "This prints if this assert fails 9 (also good for debugging!)"
assert candidate(3.0,4,7)==False, "This prints if this assert fails 10 (also good for debugging!)"

wow, thanks. yeah, we should make a decision on how to fix this. I'm going to guess that this affects other typed languages too.

have you see HumanEval+ btw? Does that address this?

No, I haven't looked into Eval+

The original Python problem barely makes sense in a typed language such as Rust:

https://github.com/nuprl/MultiPL-E/blob/main/datasets/originals/HumanEval_92_any_int.py

It's not clear to me if this should be fixed by changing the problem, removing the problem from MultiPL-E, or just left as something that fails.

Randl commented

I would read the problem as "the number is an integer" rather than "the type of variable is integer", i.e., I'd expect

assert candidate(3.0,4,7)==True

That, however, would mean the problem doesn't match the original HumanEval, so maybe it's better to just drop it.