earth-system-radiation/rte-rrtmgp

Intel Compiler Optimization Problem on AMD EPYC ( Milan )

Closed this issue · 10 comments

Problem:
Intel compiler issues when building ICON with RTE-RRTMGP on "Levante", the new HPC system at DKRZ consisting of AMD EPYC Milan processors. The build using the Intel compiler is possible only if almost all optimizations are turned off. Especially the compiler options -m64 -march=core-avx2 do not work at all.

The error occurs in the subroutine interpolation in file rte-rrtmgp/rrtmgp/kernels/mo_gas_optics_kernels.F90 in the following nested loop:


eta = merge(col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav), 0.5_wp, &
                        col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix))
loceta = eta * float(neta-1)
jeta(itemp,icol,ilay,iflav) = min(int(loceta)+1, neta-1)

The result of the calculations for jeta is sometimes jeta=-2147483647, which leads to a segmentation fault.

Workarounds tried so far :

  1. suppress optimizations on the module mo_gas_optics_kernels.F90 by using !DIR$ NOOPTIMIZE at the beginning of the module. It works but the performance degradation is unacceptable.
  2. replacing the declaration "REAL(wp), DIMENSION(2, ncol, nlay,nflav), INTENT(out) :: col_mix" with the assumed-shape variant "REAL(wp), DIMENSION(:,:,:,:), INTENT(out) :: col_mix". This workaround works much better than the first one, but the performance is not as expected.

Next steps :

  1. trying the gcc compiler
  2. try to reproduce the error with the stand alone tests of RTE-RRTMGP

@panosadamids-dkrz I wonder if this is related to the problem found in single precision by @MennoVeerman (#39 (comment), #39)

Anyway I will be coming back to this code in the next few weeks.

I could reproduce the above problem with the integrated stand alone RTE-RRTMGP tests.
The problem occurs when compiling with -O2 or -O3.
Compile options -O0 and -O1 do work correctly.

@panosadamids-dkrz Well, that's good news in the sense that we'll know when it's fixed. Can you explain to me how to compile the code in your setup on Levante, i.e. what modules to load and environment variables to set for the stand-alone tests?

@RobertPincus here are the steps I am following on Levante

first I am activating the virtual python environment, which I created with Erik's help in my home directory (k202061)

. ~k202061/venv/bin/activate

afterwards I load the intel compiler (version 2022.1) with

module load intel-oneapi-compilers

and set the following environment variables:

export FC="ifort"

export FCFLAGS="-m64 -O3 -g -traceback -heap-arrays -assume realloc_lhs -extend-source 132"

export FCINCLUDE="-I/home/k/k202061/rte-rrtmgp"

export RRTMGP_ROOT="/home/k/k202061/rte-rrtmgp"

export NFHOME="/sw/spack-levante/netcdf-fortran-4.5.3-k6xq5g"

export NCHOME="/sw/spack-levante/netcdf-c-4.8.1-2k3cmu"

export LIKWID_ROOT=""

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$NFHOME/lib:$NCHOME/lib:$LIKWID_ROOT/lib"

Finally I run make for compiling and doing the tests. The crash occurs then in mo_gas_optics_kernels inside the subroutine interpolate3D_byflav. The reason are the values of jeta, which are calculated in subroutine interpolation, as explained in this issue.

With
export FCFLAGS:"-m64 -O1 -g -traceback -heap-arrays -assume realloc_lhs -extend-source 132
the tests are passed. It also works with -O3 -fp-model strict, but I haven't made yet CPU time comparisons to see the impact on performance, resulting from -fp-model strict.

Dmitry, faced the same issue on JUWELS BOOSTER (on AMD EPYC 7402 processor) with Intel Compiler versions 2022.1 and 2021.4. He things it is a vectorization bug of the Intel compiler and he opened the follwing issue

https://community.intel.com/t5/Intel-Fortran-Compiler/Compiler-vectorization-bug/m-p/1362591#M160235

@panosadamids-dkrz As we changed the merge to an if in the recent change to the develop branch I would be curious to know if this issue has now been resolved.

@RobertPincus I will check it !

@RobertPincus : This issue has now been resolved. I have build the latest version on Levante (AMD EPYC Milan) with high optimisation flag -O3, as mentioned above, and the make, tests and check were successful.

OK, so addressed by #223 and will close when moved to main

@RobertPincus if I am not mistaken, this particular issue was addressed in #170 and the workaround is still relevant, even with changes from #223.

It looks like workaround #170 is indeed needed even after replacing the merge with if ... else. Without it and using -march=core-avx2 (which I guess it defaults to on Milan?), ifort seemed to generate the same code (a packed division and a blend) for both merge and if versions, so even with if there could potentially be a division by zero or other unpleasantries. It feels like a clear bug in the compiler.