Tests not passed
Jerry-Master opened this issue · 13 comments
I followed your instructions and got one test not passed. It says the following:
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1633697
-1.1860729455947876
- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]
----------------------------------------------------------------------
Ran 2 tests in 82.530s
FAILED (failures=1)
****
What could be causing it?
I also encountered the same problem.
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.162605
-1.1860729455947876
- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]
----------------------------------------------------------------------
Ran 2 tests in 37.484s
FAILED (failures=1)
Same error
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1633697
-1.1860729455947876
- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]
----------------------------------------------------------------------
Ran 2 tests in 126.464s
FAILED (failures=1)
It seems the meliad library is not numerically stable, giving different scores for different users.
I will put a note in the README (a8a1dc7)
For now, it seems the small difference in score does not affect run.sh
and all other tests in run_tests.sh
,
I will let this test fail while we learn more about meliad implementation and outputs.
same here:
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)
Traceback (most recent call last):
File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1898218
-1.1860729455947876
- [-1.1898218, -1.1082345]
- [-1.1860729455947876, -1.1022869348526]
Ran 2 tests in 82.937s
same here
Traceback (most recent call last):
File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1563942
-1.1860729455947876
- [-1.1563942, -1.1297226]
- [-1.1860729455947876, -1.1022869348526]
Ran 2 tests in 62.007s
FAILED (failures=1)
It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect
run.sh
and all other tests inrun_tests.sh
, I will let this test fail while we learn more about meliad implementation and outputs.
@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh
and the orthocenter problem in run.sh
.
However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem.
without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)
Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)
======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)
Traceback (most recent call last):
File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1633697
-1.1860729455947876
- [-1.1633697, -1.122621]
- [-1.1860729455947876, -1.1022869348526]
Ran 2 tests in 82.584s
FAILED (failures=1)
Ubuntu18.4、pytorch2.1-cu11.8、A100-80G
Same here, the only test that does not pass when executing bash run_tests.sh
is test_lm_score_may_fail_numerically_for_external_meliad
.
My specific numbers:
AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1831452
-1.1860729455947876
- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]
My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0
It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect
run.sh
and all other tests inrun_tests.sh
, I will let this test fail while we learn more about meliad implementation and outputs.@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in
run_tests.sh
and the orthocenter problem inrun.sh
.However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with
DD+AR failed to solve the problem.
without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)
Problems solved using Colab!
@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]
First differing element 0:
-1.1527003
-1.1860729455947876
- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]
What kind of instance or GPU did you get when your tests were passing?
@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad self.assertEqual( AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526] First differing element 0: -1.1527003 -1.1860729455947876 - [-1.1527003, -1.1230755] + [-1.1860729455947876, -1.1022869348526]
What kind of instance or GPU did you get when your tests were passing?
The free TPU
@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad self.assertEqual( AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526] First differing element 0: -1.1527003 -1.1860729455947876 - [-1.1527003, -1.1230755] + [-1.1860729455947876, -1.1022869348526]
What kind of instance or GPU did you get when your tests were passing?
The free TPU
Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.
OHHHHH, Here we met: