[regression] 0.3.29 test failure on ARMV6
Closed this issue · 18 comments
openblas 0.3.29 fails to build on Debian's armhf machine: https://buildd.debian.org/status/fetch.php?pkg=openblas&arch=armhf&ver=0.3.29%2Bds-1&stamp=1738175608&raw=0
It seems to be a regression in the cblas_stbmv function:
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 0.157343 0.127373
2 0.00000 0.00000
3 0.447053 0.447053
4 0.277223 0.277223
5 0.806693 0.806693
******* cblas_stbmv FAILED ON CALL NUMBER:
580: cblas_stbmv ( CblasUpper, CblasNoTrans, CblasUnit,
5, 0, A, 2, X,-1) .
******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP
Debian uses a modified source tree. But I can reproduce this problem with the upstream git repo:
TARGET=ARMV6 make
The error message reads:
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 0.245164 0.245164
2 0.00000 0.00000
3 0.539071 0.539071
4 1.22516 1.22516
5 0.601097E-01 0.374625E-01
******* cblas_stbmv FAILED ON CALL NUMBER:
582: cblas_stbmv ( CblasUpper, CblasNoTrans, CblasNonUnit,
5, 0, A, 2, X, 1) .
******* cblas_stbmv FAILED ON CALL NUMBER:
2: cblas_stbmv ( CblasUpper, CblasNoTrans, CblasUnit,
1, 0, A, 2, X, 1) .
******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP
This is a regression because 0.3.28 does not fail the test. I checked the diff between 0.3.28 and 0.3.29 but did not find anything straightforward. Any idea?
It seems to be a threading issue:
$ ./xscblat2 < sin2
[...]
cblas_stbmv PASSED THE TESTS OF ERROR-EXITS
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 0.127373 0.646853
2 0.343563E-02 0.343563E-02
3 0.317183 0.317183
4 0.566511 0.566511
5 0.463458 0.463458
******* cblas_stbmv FAILED ON CALL NUMBER:
661: cblas_stbmv ( CblasLower, CblasTrans, CblasUnit,
5, 1, A, 3, X,-2) .
******* cblas_stbmv FAILED ON CALL NUMBER:
2: cblas_stbmv ( CblasUpper, CblasNoTrans, CblasUnit,
1, 0, A, 2, X, 1) .
******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP
But the test can pass in serial mode:
OPENBLAS_NUM_THREADS=1 ./xscblat2 < sin2
Update: seems to be pthread related. OpenMP is fine:
make TARGET=ARMV6 USE_OPENMP=1 FCOMMON_OPT='-frecursive -fopenmp'
Curious that you would get an error in tbmv (only). Did you change compilers between 0.3.28 and 29 perchance ?
I did not change the compiler. It's gcc-14 14.2.0-16 from Debian unstable.
I see - still seems weird that it is STBMV of all functions. I won't be able to reproduce this or eventually locate the commit that caused this until late next week. (Nothing immediately obvious that wouldn't also mess up all of GEMM)
I'm increasingly certain that none of the files/functions relevant for multithreaded TBMV were changed between 0.3.28 and 0.3.29... could be this is an older bug, a missing memory barrier somewhere or something like that, that has only a low probability to happen. Is the armhf machine an actual hardware or something like qemu ?
The armhf machine is a real arm64 machine with Neoverse-N1 CPU. (Debian's infrastructure is all real machines. I tested on Debian's porterbox which is also real machine).
Recently I have seen some regression bugs in gcc-14 which leads to internal compiler error when compiling pytorch (downgrading to gcc-13 solves the issue). Here, with the speculation that it might be something wrong from the toolchain side, I tested with gcc-13 but the same issue persists.
# cblas_stbmv is still failing at git HEAD (c139b63342b3e089b6507d45f31f062a7fbe6dcc)
make TARGET=ARMV6 CC=gcc-13 CXX=g++-13 FC=gfortran-13
Docker container running a 32bit OS on the armv8 hardware ?
It's schroot (https://wiki.debian.org/Schroot), a kind of chroot. It is also the backend of Debian's official build machines (through sbuild). I guess running this in docker may get something similar.
I'm trying to do a bisect. Will update later.
I'm on a train with variable network quality, but from checking file dates and commit logs anything remotely relevant would have happened before 0.3.28.
My local git bisect result is
# first bad commit: [d9f368dfe6a9e96807d3860b96d9b30471583dc9] TST: Signal abort for ctest failures correctly
By looking at d9f368d, I agree with you that the problem is introduced somewhere in the past.
Might be unrelated, but: OpenBLAS got recently updated to 0.3.29 in FreeBSD ports as well, and i'm experiencing a similar threading/timing test failure building it locally (via FBSD ports infrastructure), but it's amd64:
...
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat2 < ./dblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
rm -f ?BLAT2.SUMM
OMP_NUM_THREADS=2 ./sblat2 < ./sblat2.dat
Program received signal SIGBUS: Access to an undefined portion of a memory object.
Backtrace for this error:
#0 0x824e20339 in ???
#1 0x824e1f465 in ???
#2 0x8220e746f in ???
#3 0x8220e6a3a in ???
#4 0x82157f2d2 in ???
#5 0x82f6f24ea in _Unwind_ForcedUnwind
at /usr/ports/lang/gcc13/work/gcc-13.3.0/libgcc/unwind.inc:215
#6 0x8220de21b in ???
#7 0x8220de191 in ???
#8 0x8220de03a in ???
#9 0x8220ddb29 in ???
#10 0xffffffffffffffff in ???
./sblat2 < ./sblat2.dat fails with OMP_NUM_THREADS=2 (or any non-1 value), but passes with OMP_NUM_THREADS=1.
The compiler is gcc13, it fails the same with OpenMP on or off, regardless of -O level.
Another clue is that the official FreeBSD package (from the same port/recipe) is already available for this update, which means the same port/recipe was successfully built by the official FreeBSD build cluster - likely much newer/faster hardware than my local (quite dated) machine.
@xmirya probably unrelated given the difference in architectures, but what is your hardware please (the optimized BLAS kernels are different for individual cpu models) ? (SIGBUS instead of simply producing bad results could mean an access to unaligned data where the instruction used requires data alignment)
@xmirya probably unrelated given the difference in architectures, but what is your hardware please (the optimized BLAS kernels are different for individual cpu models) ? (SIGBUS instead of simply producing bad results could mean an access to unaligned data where the instruction used requires data alignment)
It's i5-2410M, it has SSE* and AVX, but no AVX2 or higher, the build is done with
MAKE_NB_JOBS=-1
NUM_THREADS=64
USE_THREAD=1
NO_AVX2=1
NO_AVX512=1
USE_OPENMP=1
BINARY=64
added to Makefile.rules (disabling OpenMP and/or adding NO_AVX=1 makes no difference)
@xmirya that would appear to be a standard Sandy Bridge target - I cannot reproduce the failure on Ryzen5 but can take an actual SandyBridge system out of storage sometime next week
@xmirya that would appear to be a standard Sandy Bridge target - I cannot reproduce the failure on Ryzen5 but can take an actual SandyBridge system out of storage sometime next week
Thanks, would be grateful if you could
Can reproduce the ARMV6 issue (on a first generation Asus Tinkerboard, Cortex-A12, so natively ARMV7 building for TARGET=ARMV6). The error only happens in "some" runs of the test, which might explain why it has not come up earlier. Curiously, I have not gotten the corresponding DTBMV test to fail, although both share the same level2 driver.
Interestingly, it is only the CBLAS version of the test that fails, making a fault in the test code itself likely - internally, both CBLAS and BLAS version of the call take the exact same code path after reshuffling of the arguments.