nuprl/MultiPL-E

Java transpiled test failing with optional output

Closed this issue · 2 comments

For example, with HumanEval_90_next_smallest, the java transpiled signature is

public static Optional<Long> nextSmallest(ArrayList<Long> lst) {

However, in the unit test below, it is not testing for Optional.of kind when output is not null

    public static void main(String[] args) {
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)2l, (long)3l, (long)4l, (long)5l)))).equals(2l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)5l, (long)1l, (long)4l, (long)3l, (long)2l)))).equals(2l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList()))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l)))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l, (long)1l, (long)1l, (long)0l)))).equals(1l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l)))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)-35l, (long)34l, (long)12l, (long)-45l)))).equals(-35l));
    }

where the first assert should have been

assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)2l, (long)3l, (long)4l, (long)5l)))).equals(Optional.of(2l)));

Otherwise, no generated function with output type Optional<Long> can satisfy any of these unit tests. I believe there are (at least) 5 instances of this error:

HumanEval_90_next_smallest
HumanEval_162_string_to_md5
HumanEval_136_largest_smallest_integers
HumanEval_12_longest
HumanEval_128_prod_signs

The looks like it may get into the cpp transpiler, so maybe best if the author can make some quick corrections here.

Thanks so much!

Thank you for reporting this. I will soon have a fix for this.

A little update on this. I have not yet updated the MultiPL-E dataset on the HF Hub. However, I have updated them in this repo. I have a little script that identifies cases where the test cases in completions files are not the same as the test cases in this repo:

https://github.com/nuprl/MultiPL-E/blob/main/check_test_consistency.py

On Java, it correctly identifies exactly the files that @PootieT reports.

On a certain model, this improved pass@1 from 25.83% to 25.97% 27.21%.