C++ test float comparison
PootieT opened this issue · 3 comments
Example: HumanEval45_triangle_area
float triangle_area(long a, long b, long c) {
// C++ program
}
int main() {
auto candidate = triangle_area;
assert(candidate((3), (4), (5)) == (6.0));
assert(candidate((1), (2), (10)) == (float(-1)));
assert(candidate((4), (8), (5)) == (8.18));
assert(candidate((2), (2), (2)) == (1.73));
assert(candidate((1), (2), (3)) == (float(-1)));
assert(candidate((10), (5), (7)) == (16.25));
assert(candidate((2), (6), (3)) == (float(-1)));
assert(candidate((1), (1), (1)) == (0.43));
assert(candidate((2), (2), (10)) == (float(-1)));
}
When comparing float outputs, often the tests would fail (in this case, the test that failed for me was the 3rd one), because I think C++ defaults instantiations like this 8.18
to a double type, which then doesn't match with the program output (float
type)
there is at least another failure point in HumanEval_4_mean_absolute_deviation
, but there could be many more.
float mean_absolute_deviation(std::vector<float> numbers) {
// C++ program
}
int main() {
auto candidate = mean_absolute_deviation;
assert(candidate((std::vector<float>({(float)1.0, (float)2.0}))) == (0.5));
assert(candidate((std::vector<float>({(float)1.0, (float)2.0, (float)3.0, (float)4.0}))) == (1.0));
assert(candidate((std::vector<float>({(float)1.0, (float)2.0, (float)3.0, (float)4.0, (float)5.0}))) == (1.2));
}
Thanks for pointing this out! CC @abhijangda
Thank you for pointing this out.
Since double has higher precision than float, numbers with 2 or more numbers after decimal are not equal while with less than 2 are equal.
For example:
` #include
int main() {
std::cout<<"eq 1 " << (1.0 == 1.0f) << std::endl;
std::cout<<"eq 0 " << (0.0 == 0.0f) << std::endl;
std::cout<<"eq 2 " << (2.18 == 2.18f) << std::endl;
}
`
Gives the output:
eq 1 1 eq 0 1 eq 2 0
I will soon push the fix.
I believe these are the affected files for C++:
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_0_has_close_elements.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_133_sum_squares.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_137_compare_one.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_151_double_the_difference.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_20_find_closest_elements.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_21_rescale_to_unit.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_22_filter_integers.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_2_truncate_number.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_45_triangle_area.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_47_median.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_4_mean_absolute_deviation.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_71_triangle_area.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_81_numerical_letter_grade.results.json.gz
../experiments/humaneval-cpp-bigcode_15b_800m-0.2-reworded/HumanEval_92_any_int.results.json.gz
pass@1 increases from 27.15% to 27.61% on this model.