Racket unit test numerical equivalence

Question

Racket unit test numerical equivalence

PootieT opened this issue a year ago · 4 comments

Example program: HumanEval_99_closest_integer

This is the current test

    (check-equal? (candidate "14.5") 15)

which outputs:

--------------------
FAILURE
name:       check-equal?
location:   problem.rkt:27:4
actual:     15.0
expected:   15
--------------------

Here are some alternatives we may consider (source):

    (check = (candidate "14.5") 15)
    (check-= (candidate "14.5") 15 0.01)
    (check-within (candidate "14.5") 15 0.01)

All of them would pass with the same inputs. The second and third version checks equivalence with small error range.

Answer 1 · 2023-04-22T00:30:31.000Z

Agreed. But, we may need to generalize this to work on lists of numbers as well.

Answer 2 · 2023-04-22T14:29:18.000Z

seems like check-within allows comparison in between lists

(check-within (list 0 2.0 3 5 9 123) (list 0 2 3 5 9 123) 0.01) ; passes

although, in this one weird case, one program returned a set, with all elements the same as the expected output, but as a list, and in this case, no current checking method allows the two values to be the same. Perhaps for the best..

(check-match (set 0 2 3 5 9 123) (list 0 2 3 5 9 123)) ; does not pass

Answer 3 · 2023-04-23T13:13:07.000Z

Conveniently, it seems like check-within supports heterogeneous lists too:

Welcome to Racket v8.2 [cs].
> (require rackunit)
> (check-within '("hi" 2) '("hi" 2.001) 0.05)
> (check-within '("hi" 2) '("hi" 2.1) 0.05)
--------------------
; FAILURE [,bt for context]
name:       check-within
location:   readline-input:3:0
actual:     '("hi" 2)
expected:   '("hi" 2.1)
--------------------
>

So, we should be able to just use check-within instead of check-equal?

Answer 4 · 2023-04-23T23:03:10.000Z

Fixed. Racket performance on a model increases slightly from 10.62% to 11.19%. I suspect with better Racket training data, it will have more of an impact.