Tests not passed

Question

Tests not passed

Jerry-Master opened this issue a year ago · 13 comments

I followed your instructions and got one test not passed. It says the following:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 82.530s

FAILED (failures=1)
****

What could be causing it?

Answer 1 · 2024-01-18T16:25:02.000Z

I also encountered the same problem.

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.162605
-1.1860729455947876

- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 37.484s

FAILED (failures=1)

Answer 2 · 2024-01-19T09:39:48.000Z

Same error

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 126.464s

FAILED (failures=1)

Answer 3 · 2024-01-20T01:28:46.000Z

It seems the meliad library is not numerically stable, giving different scores for different users.
I will put a note in the README (a8a1dc7)
For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh,
I will let this test fail while we learn more about meliad implementation and outputs.

Answer 4 · 2024-01-20T17:49:30.000Z

same here:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1898218
-1.1860729455947876

[-1.1898218, -1.1082345]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.937s

Answer 5 · 2024-01-29T09:00:07.000Z

same here

Traceback (most recent call last):
File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1563942
-1.1860729455947876

[-1.1563942, -1.1297226]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 62.007s

FAILED (failures=1)

Answer 6 · 2024-01-29T10:50:07.000Z

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Answer 7 · 2024-02-08T06:10:55.000Z

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

[-1.1633697, -1.122621]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.584s

FAILED (failures=1)

Ubuntu18.4、pytorch2.1-cu11.8、A100-80G

Answer 8 · 2024-02-20T19:49:03.000Z

Same here, the only test that does not pass when executing bash run_tests.sh is test_lm_score_may_fail_numerically_for_external_meliad.

My specific numbers:

AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1831452
-1.1860729455947876

- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]

My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0

Answer 9 · 2024-02-20T23:43:44.000Z

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Problems solved using Colab!

Answer 10 · 2024-03-06T19:40:56.000Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

Answer 11 · 2024-03-09T02:59:05.000Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Answer 12 · 2024-04-12T17:18:45.000Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.

Answer 13 · 2024-10-21T09:29:14.000Z

OHHHHH, Here we met:

#143

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

same here

====================================================================== FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)