linbox-team/fflas-ffpack

Failing tests when built with `openblas` on OSX

Closed this issue · 8 comments

As described in NixOS/nixpkgs#45013, testing fflas-ffpack fails on OSX when built against openblas:

==============================================
   FFLAS-FFPACK 2.3.2: tests/test-suite.log
==============================================

# TOTAL: 33
# PASS:  11
# SKIP:  0
# XFAIL: 0
# FAIL:  22
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: test-fger
===============

FAIL test-fger (exit status: 139)

FAIL: test-ftrsv
================

Checking with Modular<double> mod 36289873
FAIL test-ftrsv (exit status: 139)

FAIL: test-ftrtri
=================

Checking with Modular<double> mod 131
Checking FTRTRI_Lower_Unit....................PASSED (0.031471)
Checking FTRTRI_Upper_Unit....................PASSED (0.017059)
Checking FTRTRI_Lower_NonUnit.................PASSED (0.023665)
Checking FTRTRI_Upper_NonUnit.................PASSED (0.021203)
Checking with Modular<double> mod 13
Checking FTRTRI_Lower_Unit....................PASSED (0.018172)
Checking FTRTRI_Upper_Unit....................PASSED (0.020317)
Checking FTRTRI_Lower_NonUnit.................PASSED (0.016548)
Checking FTRTRI_Upper_NonUnit.................PASSED (0.00784)
Checking with Modular<double> mod 33811
Checking FTRTRI_Lower_Unit....................PASSED (0.004757)
Checking FTRTRI_Upper_Unit....................PASSED (0.004347)
Checking FTRTRI_Lower_NonUnit.................PASSED (0.004994)
Checking FTRTRI_Upper_NonUnit.................PASSED (0.007178)
Checking with ModularBalanced<double> mod 131
FAIL test-ftrtri (exit status: 139)

FAIL: test-ftrmv
================

Checking with Modular<double> mod 646771
Checking FTRMV_Lower_NoTrans_Unit............PASSED (0.004181)
FAIL test-ftrmv (exit status: 139)

FAIL: test-ftrsm
================

Checking with Modular<double> mod 1547437
FAIL test-ftrsm (exit status: 139)

FAIL: test-ftrsm-check
======================

FAIL test-ftrsm-check (exit status: 139)

FAIL: test-ftrmm
================

Checking with Modular<double> mod 61
FAIL test-ftrmm (exit status: 139)

FAIL: test-fgemm
================

Checking ..........................Modular<double> mod 1747 ...  ** On entry to DGEMM  parameter number 10 had an illegal value
 ** On entry to DGEMM  parameter number 10 had an illegal value
 ** On entry to DGEMM  parameter number 10 had an illegal value
 ** On entry to DGEMM  parameter number 10 had an illegal value
 ** On entry to DGEMM  parameter number 10 had an illegal value
 ** On entry to DGEMM  parameter number 10 had an illegal value
FAIL
a   :1, b   : 0
m   :16, n   : 44, k   : 36
ldA :48, ldB : 47, ldC : 45
Error C[0,0]=813 D[0,0]=202
Error C[0,1]=205 D[0,1]=612
Error C[0,2]=256 D[0,2]=1713
Error C[0,3]=102 D[0,3]=494
Error C[0,4]=872 D[0,4]=631
Error C[0,5]=165 D[0,5]=699
Error C[0,6]=1299 D[0,6]=1459
Error C[0,7]=246 D[0,7]=70
Error C[0,8]=1737 D[0,8]=1069
Error C[0,9]=592 D[0,9]=1088
Error C[0,10]=401 D[0,10]=637
Error C[0,11]=396 D[0,11]=1058
Error C[0,12]=1587 D[0,12]=848
Error C[0,13]=1498 D[0,13]=838
Error C[0,14]=1695 D[0,14]=1059
Error C[0,15]=196 D[0,15]=974
Error C[0,16]=290 D[0,16]=1518
Error C[0,17]=1634 D[0,17]=1479
Error C[0,18]=213 D[0,18]=1132
Error C[0,19]=769 D[0,19]=1695
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
FAILED
FAIL test-fgemm (exit status: 1)

FAIL: test-fgemm-check
======================

FAIL test-fgemm-check (exit status: 139)

FAIL: test-lu
=============

FAIL test-lu (exit status: 139)

FAIL: test-pluq-check
=====================

FAIL test-pluq-check (exit status: 139)

FAIL: test-fsyrk
================

Checking with Modular<double> mod 5
FAIL test-fsyrk (exit status: 139)

FAIL: test-fsytrf
=================

FAIL test-fsytrf (exit status: 139)

FAIL: test-invert-check
=======================

 -q 131071 -n 1000 -i 3 -s 1534291159349551
FAIL test-invert-check (exit status: 139)

FAIL: test-rankprofiles
=======================

FAIL test-rankprofiles (exit status: 139)

FAIL: test-det-check
====================

init: 0.00863194s (0.02807 cpu) [1]
FAIL test-det-check (exit status: 139)

FAIL: test-echelon
==================

FAIL test-echelon (exit status: 139)

FAIL: test-charpoly
===================

FAIL test-charpoly (exit status: 139)

FAIL: test-charpoly-check
=========================

FAIL test-charpoly-check (exit status: 139)

FAIL: test-minpoly
==================

FAIL test-minpoly (exit status: 139)

FAIL: test-solve
================

FAIL test-solve (exit status: 139)

FAIL: test-fgemv
================

FAIL test-fgemv (exit status: 139)

============================================================================
Testsuite summary for FFLAS-FFPACK 2.3.2
============================================================================
# TOTAL: 33
# PASS:  11
# SKIP:  0
# XFAIL: 0
# FAIL:  22
# XPASS: 0
# ERROR: 0

Hi,
Thanks for the report and sorry for the delay.
it took me a while to get a OSX box running. Unfortunately, I can not reproduce the bug.
I tried both the last releases (which are the one packaged in SageMath) and upstream givaro and fflas-ffpack. The whole test-suite passes on those two cases.

I'm using upstream Openblas (at a71923514fb9ad611fb87345b8cfa4383404b54f), and g++ provided by LLVM on a Darwin 13.4.0.

ciosx:tests ci$ uname -a
Darwin ciosx 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64
ciosx:tests ci$ g++ -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

Could you give more details on your compilation environment?

Hi,
I get one of these errors on OSX with fflas-ffpack and openblas.

$ more test-suite.log
==============================================
   FFLAS-FFPACK 2.3.2: tests/test-suite.log
==============================================

# TOTAL: 34
# PASS:  33
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: test-fgemm-check
======================

Checking ..........................Modular<double> mod 1693 ... PASSED with seed = 1541770760090127
Checking .........................Modular<double> mod 12421 ... PASSED with seed = 1541770760090128
Checking ...........................Modular<double> mod 139 ... PASSED with seed = 1541770760090129
Checking ..................ModularBalanced<double> mod 1693 ... PASSED with seed = 1541770760090127
Checking .................ModularBalanced<double> mod 12421 ... PASSED with seed = 1541770760090128
Checking ...................ModularBalanced<double> mod 139 ... PASSED with seed = 1541770760090129
Checking ...........................Modular<float> mod 1693 ... PASSED with seed = 1541770760090127
Checking .............................Modular<float> mod 47 ... FAILED
FAILED with seed = 1541770760090128
FAIL test-fgemm-check (exit status: 1)

Config
fflas_ffpack: 2.3.2
openblas: 0.2.20 (with some patches)

$ uname -a
Darwin laptop-147-210-128-171.labri.fr 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/Users/doctorant/sage/local/libexec/gcc/x86_64-apple-darwin17.7.0/7.2.0/lto-wrapper
Target: x86_64-apple-darwin17.7.0
Configured with: ../src/configure --prefix=/Users/doctorant/sage/local --with-local-prefix=/Users/doctorant/sage/local --with-gmp=/Users/doctorant/sage/local --with-mpfr=/Users/doctorant/sage/local --with-mpc=/Users/doctorant/sage/local --with-system-zlib --disable-multilib --disable-nls --enable-languages=c,c++,fortran --disable-libitm --with-build-config=bootstrap-debug --without-isl --without-cloog  
Thread model: posix
gcc version 7.2.0 (GCC) 

Link to the sage's ticket #26130.

I just accidentally reproduced the test-fgemm-check failure on linux by building openblas without openmp. NixOS defaults to no openmp on darwin, so that may be the issue or one of the issues.

Okay this may just have been a coincidence. With the exact same environment, the second build succeeded. I'm going to try a couple more times to see if it re-occurs (and if openmp makes any difference).

The failure also occurs with openmp enabled. It just occurs a lot ofter when disabled: 3/10 times without openmp, 1/20 times with.

Of course that could just be coincidence or load may be a factor.

Looks like that (and probably also @vinklein's issue) is actually #146, which is fixed on master but not in a released version yet.

P1K commented

Hi,
I thought I'd run some tests myself as I have an OSX install, and I have no failed tests on Darwin with openblas. In particular, I let test-fgemm-check run for several minutes without catching anything.

Ok this seems to be a fixed issue. Closing.