Racket unit test numerical equivalence
PootieT opened this issue · 4 comments
Example program: HumanEval_99_closest_integer
This is the current test
(check-equal? (candidate "14.5") 15)
which outputs:
--------------------
FAILURE
name: check-equal?
location: problem.rkt:27:4
actual: 15.0
expected: 15
--------------------
Here are some alternatives we may consider (source):
(check = (candidate "14.5") 15)
(check-= (candidate "14.5") 15 0.01)
(check-within (candidate "14.5") 15 0.01)
All of them would pass with the same inputs. The second and third version checks equivalence with small error range.
Agreed. But, we may need to generalize this to work on lists of numbers as well.
seems like check-within
allows comparison in between lists
(check-within (list 0 2.0 3 5 9 123) (list 0 2 3 5 9 123) 0.01) ; passes
although, in this one weird case, one program returned a set
, with all elements the same as the expected output, but as a list
, and in this case, no current checking method allows the two values to be the same. Perhaps for the best..
(check-match (set 0 2 3 5 9 123) (list 0 2 3 5 9 123)) ; does not pass
Conveniently, it seems like check-within
supports heterogeneous lists too:
Welcome to Racket v8.2 [cs].
> (require rackunit)
> (check-within '("hi" 2) '("hi" 2.001) 0.05)
> (check-within '("hi" 2) '("hi" 2.1) 0.05)
--------------------
; FAILURE [,bt for context]
name: check-within
location: readline-input:3:0
actual: '("hi" 2)
expected: '("hi" 2.1)
--------------------
>
So, we should be able to just use check-within
instead of check-equal?
Fixed. Racket performance on a model increases slightly from 10.62% to 11.19%. I suspect with better Racket training data, it will have more of an impact.