need help: test fails on darwin x86-64 (but curiously not aarch64)
donn opened this issue · 4 comments
The test test_tch1dn/zch1dn
fails on x86-64 versions of macOS, but not aarch64.
The residual errors are worse overall on the former, but it is only large enough to tip over into a failure for zch1dn
.
Is this too serious of a problem? Is there a way for me to adjust the threshold?
x86-64 test log
Output:
----------------------------------------------------------
testing Cholesky rank-1 downdate routines.
All residual errors are expected to be small.
sch1dn test:
residual error = 0.572204589844E-05 PASS
dch1dn test:
residual error = 0.888178419700E-14 PASS
cch1dn test:
residual error = 0.953972266871E-05 PASS
zch1dn test:
residual error = 0.497379915032E-13 FAIL
----------------------------------------------------------------------
total: PASSED 3 FAILED 1
aarch64 test log
Output:
----------------------------------------------------------
testing Cholesky rank-1 downdate routines.
All residual errors are expected to be small.
sch1dn test:
residual error = 0.476837158203E-05 PASS
dch1dn test:
residual error = 0.106581410364E-13 PASS
cch1dn test:
residual error = 0.152587890625E-04 PASS
zch1dn test:
residual error = 0.284217094304E-13 PASS
----------------------------------------------------------------------
total: PASSED 4 FAILED 0
Can you give some more details about the used BLAS library? Since the must be a reason. Nevertheless I seem to be safe to adjust the tolerance a bit.
Just change the the factor 2D2
in
Line 258 in 44a34de
to
1D3
,Hey, on Nixpkgs (Which also distributes software for both Darwin platforms) we experience exactly the same issue with x86_64-darwin (and not aarch64-darwin):
1/13 Test #1: test_tch1dn ......................***Failed 16.07 sec
testing Cholesky rank-1 downdate routines.
All residual errors are expected to be small.
sch1dn test:
residual error = 0.572204589844E-05 PASS
dch1dn test:
residual error = 0.106581410364E-13 PASS
cch1dn test:
residual error = 0.953972266871E-05 PASS
zch1dn test:
residual error = 0.497379915032E-13 FAIL
----------------------------------------------------------------------
total: PASSED 3 FAILED 1
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP 1
I don't have available the floating points we had on aarch64-darwin unfortunately (because there the tests passed). Our blas
and lapack
implementations are both based on openblas
version 0.3.27
. The build log of it is available here (for x86_64-darwin):
https://cache.nixos.org/log/vdk8dns4jvy1n7w1djhdy7i1a3ph37p0-openblas-0.3.27.drv
I don't have personally an x86_64-darwin machine, so I am able to only use our CI which is very slow for these platforms unfortunately, so I won't be able to help much in debugging. I hope the debugging information I provided helps a bit.
It's the same blas version FWIW. I am using Nix to build qrupdate.
It seems that the tolerances need to be adjusted a bit more. I'll prepare a patch during the next days. But the solution seems to adjust this line
Line 258 in 44a34de