LSODPK example results sensitive to choice of compiler
Opened this issue · 12 comments
With your latest sources (13 Oct 2022), I find that with some compilers the run for the LSODPK example produces many warnings with the pattern
DLSODPK- Warning. Poor iterative algorithm performance seen
at T = R1 by linear convergence failure rate = R2
In above, R1 = 0.1377169669546D+01 R2 = 0.9166666865349D+00
Here are some specific details, all on a Windows 11-64 PC.
S:\ALGO\ODEPACK\UJ2\odepack-master\src>ifort /O2 /c M_odepack.f90
S:\ALGO\ODEPACK\UJ2\odepack-master\example>ifort /O2 /I..\src lsodpk.f90 ..\src\M_odepack.obj
S:\ALGO\ODEPACK\UJ2\odepack-master\example>del ccout & lsodpk
generates 10 sets of warnings
If I use the Ifx compiler instead of Ifort, no warnings. If I use Gfortran 11.3 (Cygwin64), with -fallow-argument-mismatch -O2, no warnings. If I use the NAG 7.1 compiler with -dusty -O2, I get the warnings.
My hunch is that this sensitivity of the algorithm to slight variations in floating point calculations is inherent in the algorithms used, rather than being caused by bugs in the source code or the compilers.
Those messages are from the program itself, and so far I have not found any compiler options on a Linux box using ifort, gfortran, and nvfortran that generate them. As mentioned, they are probably due to a numeric sensitivity in the algorithm, which ODEPACK is aware of enough to generate messages; but it would be good to know it is not from an undefined variable or some other cause; or to know a little more about what is causing the problem. The routines have been around long enough and I can see no other mention of issues with the test and example problems that it is surprising the errors are being produced even though they could be expected for some problems. The results are still the same?
I am able to reproduce these warning messages from the programs on a Linux box with the Intel compiler when I use certain optimization options.
PC: Beelink micro-pc, AMD 4700U CPU, running ClearLinux.
Compiler: ifort version 2021.6.0.
Sources: downloaded from this Github today (Oct 14)
Build and Run:
~/ALGO/ODE/ODEPACK/odepack-master/src $ ifort -O2 -xcoffeelake -c M_odepack.f90
~/ALGO/ODE/ODEPACK/odepack-master/example $ ifort -O2 -xcoffeelake -I../src lsodpk.f90 ../src/M_odepack.o
~/ALGO/ODE/ODEPACK/odepack-master/example $ ./a.out | grep -i warn
The output contains the following line ten times:
DLSODPK- Warning. Poor iterative algorithm performance seen
When I rebuild, with the additional option -fp-model precise, the warnings go away.
The detailed explanations of what such FP options do can be found in this article.
I was having no problems with these options
COMPILER: gfortran
COMPILER OPTIONS: -fallow-argument-mismatch -Ofast -O3 -funroll-loops -Wimplicit-interface -fPIC -fmax-errors=1 -fcoarray=single
COMPILER: ifort
COMPILER OPTIONS: -fast=2 -fp-model precise -pc64 -align all -error-limit 1 -reentrancy threaded -nogen-interfaces -assume byterecl
COMPILER: nvfortran
COMPILER OPTIONS: -fast -Mbackslash
but when I compiled only with the "fast" options, a test case taking only 0.186 wallclock seconds jumped to over four minutes with ifort, and an apparent infinite loop with gfortran ( I killed it after 33 minutes). Comparing the machine code from the options or profiling should at least isolate it down to a routine; would be good to change the code so anyone building with arbitrary options does not hit the issue even if it gives no performance gain over "good" options. I have -fp-model precise in my scripts and it is the default for --profile release in fpm(1) so I was not seeing that. Still not getting the warning messages though.
Which is the test case that went into an apparent infinite loop? I'd like to explore it further. Is it one of the examples in the test
directory?
example/lsodar.f90
I am so far unable to reproduce the problem with LSODAR.
If you don't mind, please see if changing the INTENT of the argument DY of DCOPY to (IN OUT) fixes the problem.
Which version of Gfortran do you use?
gfortran 10.3. gfortran -Ofast -fallow-argument-mismatch are the only options needed to cause it.
Changing DCOPY did not change it.
Goes into a loop looking returning the same values. Cannot duplicate so far with nvfortran, ifort.
Demonstration program for DLSODAR package
First problem
Problem is dy/dt = ((2*log(y)+8)/t - 5)*y, y(1) = 1
Solution is y(t) = exp(-t**2 + 5*t - 4)
Root functions are:
g1 = dy/dt (root at t = 2.5)
g2 = log(y) - 2.2491 (roots at t = 2.47 and t = 2.53)
itol = 1 rtol = 0.1D-05 atol = 0.1D-05
jt = 2
At t = 0.2000000D+01 y = 0.7389071D+01 error = 0.1534D-04
At t = 0.2469971D+01 y = 0.9479201D+01 error = 0.1631D-04
Root found at t = 0.2469971D+01 jroot = 0 1
Error in t location of root is -0.2865D-04
At t = 0.2469971D+01 y = 0.9479201D+01 error = 0.1631D-04
Root found at t = 0.2469971D+01 jroot = 0 1
Error in t location of root is -0.2865D-04
At t = 0.2469971D+01 y = 0.9479201D+01 error = 0.1631D-04
Here is what goes wrong. The function dumach()
is called to obtain the machine epsilon. With -fast, the code returns zero instead of epsilon.
The original code of dumach() contains a call to the external routine dumsum
to prevent the compiler from optimizing away the loop and returning zero for epsilon. You converted the external routine dumsum
to a contained subroutine in the same file.
Gfortran (I used, on Windows, 10.3 Msys and 11.3 Cygwin 64) is smart enough to see that the call to dumsum
can be optimized away and does so if -fast has been specified. If dumach
and dumsum
are in separately compiled source files, the correct epsilon will probably be returned.
You could either replace all instances of = dumach()
with = epsilon(0d0)
or change dumach
to return epsilon(0d0)
. I have read comments before in posts that there is no portable way of writing Fortran code to compute machine epsilon with the correct result guaranteed.
If you elect to remove thedumach()
code, the dumsum
code can also be removed since it is used only by dumach
.
In a private version where I replaced the work array to a user-defined type with an integer and real component and removed several routines and replaced them with intrinsics I used epsilon; but before committing more changes I wanted a unit test of the routines in place in case I introduced changes. I will make that change, and change dumach() to epsilon(0.0d0).
great job tracking that down.
that eliminated the problems I was seeing while trying to reproduce your issue; but I assume you are still seeing convergence problems?
Yes, with ifort /O2 on Windows the "Poor iterative algorithm performance seen" warnings come up for the LSODPK example. They go away if I add /fp:precise.
Before I made the runs that resulted in my encountering and reporting the warnings, I had already replaced dumach()
with epsilon(0.0d0)
before building.
I think that it is the daxpy.inc
code that is responsible for leading to the performance problem when /fp:precise is not specified. I commented out the two lines that include the Daxpy.inc file and declare the symbol as public, rebuilt with /O2 and linked with the MKL library (to satisfy the reference to daxpy at link time). The warnings went away.
This may be a good time to consider removing the source codes for all the BLAS routines from your codebase. Users can be told to use OpenBLAS, MKL, AMD BLIS, or some other equivalent. Calls to the Linpack routines DGEFA/DGESL and DGBFA/DGBSL could be replaced by suitable calls to Lapack routines DGETRF/DGETRS and DGBTRF/DGBTRS.
I am reopening this issue (which I had earlier closed) since I have found that there is something unusual in either the code or the algorithm itself that causes these "Poor iterative algorithm performance seen" warnings to occur at unexpected times.
In an attempt to understand why these warnings were being seen with some compilers and not with others, I changed the LSODPK example to read in values of RTOL and ATOL instead of fixing their values in the source code as in the original code (both at 1.0d-5 there). Now I can see the "Poor performance" warnings with almost any compiler, and that the occurrence of these warnings (and subsequent failure to produce results) can be quite sensitive to the choice of RTOl and ATOL.
For example, gfortran 12.2.1 on Linux, -O2 -fallow-argument-mismatch, with RTOL = 1.0d-5:
ATOL = 2.4990989d-6: no warnings
2.4990990d-6: seven warnings for mf = 24
2.4990-6: eight warnings for mf = 23, six warnings for mf = 24
This level of sensitivity to the tolerance value is unusual, and I do not know what causes it.
Perhaps other users of the LSODPK solver can help by providing more examples with which to test this peculiar behavior.