fortran-lang/stdlib

Floating point exception encountered in `svdvals(A)` originating from a call to `stdlib_ieeeck`

Closed this issue · 1 comments

Description

I've been playing around with the new matrix norm function mnorm and found that a floating point exception (a division by zero in particular) is raised if I compile the run code with

fpm run --compiler "gfortran" --flag "-Og -g3 -fbacktrace -ffpe-trap=zero"

After some quick investigation, it seems that the culprit is somewhere in the calling tree of svdvals. Here is a MWE.

program main
  use stdlib_linalg_constants, only: dp
  use stdlib_linalg, only: mnorm, svdvals, svd
  implicit none
  integer, parameter :: n = 3
  real(dp) :: A(n, n)
  real(dp) :: S(n)
  !> Initialize random matrix.
  call random_number(A)
  !> Compute its singular values.
  S = svdvals(A)
end program main

When compiling with the command above, I get

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f907d4b851f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#1  0x555ee55a277a in __stdlib_linalg_lapack_aux_MOD_stdlib_ieeeck
	at build/dependencies/stdlib/src/stdlib_linalg_lapack_aux.F90:136
#2  0x555ee55a3a01 in __stdlib_linalg_lapack_aux_MOD_stdlib_ilaenv
	at build/dependencies/stdlib/src/stdlib_linalg_lapack_aux.F90:1369
#3  0x555ee5788b76 in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dlasq2
	at build/dependencies/stdlib/src/stdlib_lapack_svd_bidiag_qr.f90:725
#4  0x555ee578ac7e in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dlasq1
	at build/dependencies/stdlib/src/stdlib_lapack_svd_bidiag_qr.f90:169
#5  0x555ee56e08e5 in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dbdsqr
	at build/dependencies/stdlib/src/stdlib_lapack_eigv_svd_drivers3.f90:561
#6  0x555ee57d5860 in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dlasdq
	at build/dependencies/stdlib/src/stdlib_lapack_eigv_svd_bidiag_dc.f90:3504
#7  0x555ee56da988 in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dbdsdc
	at build/dependencies/stdlib/src/stdlib_lapack_eigv_svd_drivers3.f90:2239
#8  0x555ee559ab05 in __stdlib_lapack_eig_svd_lsq_MOD_stdlib_dgesdd
	at build/dependencies/stdlib/src/stdlib_lapack_eigv_svd_drivers2.f90:1517
#9  0x555ee554c060 in __stdlib_linalg_MOD_stdlib_linalg_svd_d
	at build/dependencies/stdlib/src/stdlib_linalg_svd.f90:489
#10  0x555ee554dbff in __stdlib_linalg_MOD_stdlib_linalg_svdvals_d
	at build/dependencies/stdlib/src/stdlib_linalg_svd.f90:320
#11  0x555ee55446c4 in MAIN__
	at app/main.f90:11
#12  0x555ee554472d in main
	at app/main.f90:2
Floating point exception (core dumped)
<ERROR> Execution for object " MWE_stdlib_mnorm " returned exit code  136
<ERROR> *cmd_run*:stopping due to failed executions
STOP 136

I get something very similar if I compile with ifx instead

<INFO> BUILD_NAME: build/ifx
 <INFO> COMPILER:  ifx
 <INFO> C COMPILER:  icx
 <INFO> CXX COMPILER: icpx
 <INFO> COMPILER OPTIONS:  -fpp -O0 -g -traceback -check all -check bounds -check uninit, -ftrapuv -debug all
 <INFO> C COMPILER OPTIONS:  
 <INFO> CXX COMPILER OPTIONS: 
 <INFO> LINKER OPTIONS:  
 <INFO> INCLUDE DIRECTORIES:  []
[100%] Project compiled successfully.
 + build/ifx_A1EB479C3608C637/app/MWE_stdlib_mnorm 
forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source             
MWE_stdlib_mnorm   000000000046E723  Unknown               Unknown  Unknown
libc.so.6          00007F70F79C1520  Unknown               Unknown  Unknown
MWE_stdlib_mnorm   0000000000BF00BC  stdlib_ieeeck             136  stdlib_linalg_lapack_aux.F90
MWE_stdlib_mnorm   0000000000C22DDF  stdlib_ilaenv            1369  stdlib_linalg_lapack_aux.F90
MWE_stdlib_mnorm   00000000040422C5  stdlib_dlasq2             725  stdlib_lapack_svd_bidiag_qr.f90
MWE_stdlib_mnorm   00000000040168FD  stdlib_dlasq1             170  stdlib_lapack_svd_bidiag_qr.f90
MWE_stdlib_mnorm   0000000002E9A13B  stdlib_dbdsqr             562  stdlib_lapack_eigv_svd_drivers3.f90
MWE_stdlib_mnorm   0000000004B24626  stdlib_dlasdq            3547  stdlib_lapack_eigv_svd_bidiag_dc.f90
MWE_stdlib_mnorm   0000000002F4C703  stdlib_dbdsdc            2238  stdlib_lapack_eigv_svd_drivers3.f90
MWE_stdlib_mnorm   00000000005F3BCE  stdlib_dgesdd            1516  stdlib_lapack_eigv_svd_drivers2.f90
MWE_stdlib_mnorm   00000000004E11A5  stdlib_linalg_svd         488  stdlib_linalg_svd.f90
MWE_stdlib_mnorm   00000000004C4A91  stdlib_linalg_svd         320  stdlib_linalg_svd.f90
MWE_stdlib_mnorm   000000000049A926  main                       11  main.f90
MWE_stdlib_mnorm   000000000040E339  Unknown               Unknown  Unknown
libc.so.6          00007F70F79A8D90  Unknown               Unknown  Unknown
libc.so.6          00007F70F79A8E40  __libc_start_main     Unknown  Unknown
MWE_stdlib_mnorm   000000000040E236  Unknown               Unknown  Unknown
Aborted
<ERROR> Execution for object " MWE_stdlib_mnorm " returned exit code  134
<ERROR> *cmd_run*:stopping due to failed executions
STOP 134

Here are the first few lines of stdlib_ieeeck:

 pure integer(ilp) function stdlib_ieeeck( ispec, zero, one )
 !! IEEECK is called from the ILAENV to verify that Infinity and
 !! possibly NaN arithmetic is safe (i.e. will not trap).
    ! -- lapack auxiliary routine --
    ! -- lapack is a software package provided by univ. of tennessee,    --
    ! -- univ. of california berkeley, univ. of colorado denver and nag ltd..--
       ! Scalar Arguments 
       integer(ilp), intent(in) :: ispec
       real(sp), intent(in) :: one, zero
    ! =====================================================================
       ! Local Scalars 
       real(sp) :: nan1, nan2, nan3, nan4, nan5, nan6, neginf, negzro, newzro, posinf
       ! Executable Statements 
       stdlib_ieeeck = 1
       posinf = one / zero

The error seems to originate from this last line with the division by zero being trapped when compiling with -ffpe-trap=zero. This seems to be a recuring cause of concerns when working in debug mode (see here or here for instance). It probably is more of a lapack issue than stdlib per say but even so, if any one knows a work-around that'd be great. Maybe a known issues page could be added to the stdlib website where this error is being referenced for everyone to be aware.

Expected Behaviour

It should just work I guess.

Version of stdlib

0.7

Platform and Architecture

Ubuntu 22.04.5 LTS

Additional Information

Compilers tested:

  • gfortran 14.2.0
  • ifx 2025.0.4 (20241205)

Some additional observations:

  • If instead of a random matrix, I instantiate the matrix as A = eye(n, mold=1.0_dp), the whole thing works correctly.
  • If I do call svd(A, S, U, Vt), the whole thing works correctly.

Ping @perazz, @jvdp1, @jalvesz

A quick search shows that:

  • the check is made when calling stdlib_ilaenv(10,...) (check that inf and nan math is supported) or stdlib_ilaenv(11,...) (check inf only).
  • stdlib_ilaenv(11,...) seems to never get used
  • stdlib_ilaenv(10,...) appears a few times in *HEEVR, *SYEVR, *STEVR, *LASQ2

I think we could just replace these tests with the modern ieee_arithmetic module:

     pure integer(ilp) function stdlib_ieeeck(ispec,zero,one)
        ! -- lapack auxiliary routine --
        ! -- lapack is a software package provided by univ. of tennessee,    --
        ! -- univ. of california berkeley, univ. of colorado denver and nag ltd..--
           ! Scalar Arguments
           integer(ilp),intent(in) :: ispec
           real(sp),intent(in) :: one,zero
        ! =====================================================================
           ! Executable Statements
           stdlib_ieeeck = 1
           ! Test support for infinity values
           if (.not.ieee_support_inf(one)) then 
              stdlib_ieeeck = 0
              return
           end if
           ! return if we were only asked to check infinity arithmetic
           if (ispec == 0) return
           if (.not.ieee_support_nan(one)) then
              stdlib_ieeeck = 0
              return
           end if
           return
     end function stdlib_ieeeck