nuprl/MultiPL-E

Racket unit test numerical equivalence

PootieT opened this issue · 4 comments

Example program: HumanEval_99_closest_integer

This is the current test

    (check-equal? (candidate "14.5") 15)

which outputs:

--------------------
FAILURE
name:       check-equal?
location:   problem.rkt:27:4
actual:     15.0
expected:   15
--------------------

Here are some alternatives we may consider (source):

    (check = (candidate "14.5") 15)
    (check-= (candidate "14.5") 15 0.01)
    (check-within (candidate "14.5") 15 0.01)

All of them would pass with the same inputs. The second and third version checks equivalence with small error range.

Agreed. But, we may need to generalize this to work on lists of numbers as well.

seems like check-within allows comparison in between lists

(check-within (list 0 2.0 3 5 9 123) (list 0 2 3 5 9 123) 0.01) ; passes

although, in this one weird case, one program returned a set, with all elements the same as the expected output, but as a list, and in this case, no current checking method allows the two values to be the same. Perhaps for the best..

(check-match (set 0 2 3 5 9 123) (list 0 2 3 5 9 123)) ; does not pass

Conveniently, it seems like check-within supports heterogeneous lists too:

Welcome to Racket v8.2 [cs].
> (require rackunit)
> (check-within '("hi" 2) '("hi" 2.001) 0.05)
> (check-within '("hi" 2) '("hi" 2.1) 0.05)
--------------------
; FAILURE [,bt for context]
name:       check-within
location:   readline-input:3:0
actual:     '("hi" 2)
expected:   '("hi" 2.1)
--------------------
> 

So, we should be able to just use check-within instead of check-equal?

Fixed. Racket performance on a model increases slightly from 10.62% to 11.19%. I suspect with better Racket training data, it will have more of an impact.