OpenMathLib/OpenBLAS

[regression] 0.3.29 build/tests fail on Sandy Bridge x86_64 machine

Closed this issue · 2 comments

Building/running tests fails on older Sandy Bridge x86_64 machine under FreeBSD (FreeBSD ports system):

OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat2 < ./dblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
rm -f ?BLAT2.SUMM
OMP_NUM_THREADS=2 ./sblat2 < ./sblat2.dat

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:
#0  0x824e20339 in ???
#1  0x824e1f465 in ???
#2  0x8220e746f in ???
#3  0x8220e6a3a in ???
#4  0x82157f2d2 in ???
#5  0x82f6f24ea in _Unwind_ForcedUnwind
        at /usr/ports/lang/gcc13/work/gcc-13.3.0/libgcc/unwind.inc:215
#6  0x8220de21b in ???
#7  0x8220de191 in ???
#8  0x8220de03a in ???
#9  0x8220ddb29 in ???
#10  0xffffffffffffffff in ???

./sblat2 < ./sblat2.dat fails with OMP_NUM_THREADS=2 (or any non-1 value), but passes with OMP_NUM_THREADS=1.

The processor has AVX, but no AVX2 and above:

CPU: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz (2294.90-MHz K8-class CPU)
Origin="GenuineIntel"  Id=0x206a7  Family=0x6  Model=0x2a  Stepping=7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x1dbae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
XSAVE Features=0x1<XSAVEOPT>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics

The compiler is gcc13, the Makefile.rules options are:

NO_AVX2=1
NO_AVX512=1
USE_OPENMP=1
BINARY=64

However, it checked to fail regardless of whether any of USE_OPENMP, INTERFACE64 or NO_AVX set or not. It also fails the same regardless of -O level, and the failure happens in both 0.3.29 release and the HEAD at 1533fe49bef51ff49e4358a2687f1e475801f9fd, while all builds fine in older 0.3.27

Not reproducible with gcc14 and TARGET=SANDYBRIDGE on Zen5 hardware, also not reproducible with gcc4 or gcc9 on actual Sandy Bridge hardware under Linux. I still need to build gcc-14 on that old system however.

I've tried it with gcc14 instead, and it builds/passes tests successfully, so it might be some platform-specific miscompilation by gcc13, not happening in gcc14. The issue can thus be closed, but If you feel it worth checking, I suggest to try to reproduce it with gcc13 on that Linux machine + if it fails - mention that in some release notes, gcc13 is the default on Ubuntu 24.04 (last LTS release), as well as on FreeBSD stable for Fortran-coded packages