Java transpiled test failing with optional output

Question

Java transpiled test failing with optional output

Closed this issue a year ago · 2 comments

For example, with HumanEval_90_next_smallest, the java transpiled signature is

public static Optional<Long> nextSmallest(ArrayList<Long> lst) {

However, in the unit test below, it is not testing for Optional.of kind when output is not null

    public static void main(String[] args) {
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)2l, (long)3l, (long)4l, (long)5l)))).equals(2l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)5l, (long)1l, (long)4l, (long)3l, (long)2l)))).equals(2l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList()))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l)))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l, (long)1l, (long)1l, (long)0l)))).equals(1l));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)1l)))).equals(Optional.empty()));
    assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)-35l, (long)34l, (long)12l, (long)-45l)))).equals(-35l));
    }

where the first assert should have been

assert(nextSmallest((new ArrayList<Long>(Arrays.asList((long)1l, (long)2l, (long)3l, (long)4l, (long)5l)))).equals(Optional.of(2l)));

Otherwise, no generated function with output type Optional<Long> can satisfy any of these unit tests. I believe there are (at least) 5 instances of this error:

HumanEval_90_next_smallest
HumanEval_162_string_to_md5
HumanEval_136_largest_smallest_integers
HumanEval_12_longest
HumanEval_128_prod_signs

The looks like it may get into the cpp transpiler, so maybe best if the author can make some quick corrections here.

Thanks so much!

Answer 1 · 2023-03-28T18:13:48.000Z

Thank you for reporting this. I will soon have a fix for this.

Answer 2 · 2023-04-23T19:25:08.000Z

A little update on this. I have not yet updated the MultiPL-E dataset on the HF Hub. However, I have updated them in this repo. I have a little script that identifies cases where the test cases in completions files are not the same as the test cases in this repo:

https://github.com/nuprl/MultiPL-E/blob/main/check_test_consistency.py

On Java, it correctly identifies exactly the files that @PootieT reports.

On a certain model, this improved pass@1 from 25.83% to ~~25.97%~~ 27.21%.