google-deepmind/alphageometry

Tests not passed

Jerry-Master opened this issue · 13 comments

I followed your instructions and got one test not passed. It says the following:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 82.530s

FAILED (failures=1)
****

What could be causing it?

I also encountered the same problem.

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.162605
-1.1860729455947876

- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 37.484s

FAILED (failures=1)

Same error

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 126.464s

FAILED (failures=1)

It seems the meliad library is not numerically stable, giving different scores for different users.
I will put a note in the README (a8a1dc7)
For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh,
I will let this test fail while we learn more about meliad implementation and outputs.

same here:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
  File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]
 
First differing element 0:
-1.1898218
-1.1860729455947876
 

  • [-1.1898218, -1.1082345]
  • [-1.1860729455947876, -1.1022869348526]
     

Ran 2 tests in 82.937s

same here

Traceback (most recent call last):
File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1563942
-1.1860729455947876

  • [-1.1563942, -1.1297226]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 62.007s

FAILED (failures=1)

soxziw commented

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

  • [-1.1633697, -1.122621]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.584s

FAILED (failures=1)

Ubuntu18.4、pytorch2.1-cu11.8、A100-80G

Same here, the only test that does not pass when executing bash run_tests.sh is test_lm_score_may_fail_numerically_for_external_meliad.

My specific numbers:

AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1831452
-1.1860729455947876

- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]

My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Problems solved using Colab!

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.

OHHHHH, Here we met:

#143