GlobalArrays/ga

'zdot wrong' with gfortran-12 on fedora 36 when -O2/-O3 optimization is used

Closed this issue · 7 comments

The original issue is here https://bugzilla.redhat.com/show_bug.cgi?id=2045402

The test error looks like

> Checking zdot ...
  zdot wrong                (86672.993582718176,915971.03870635165)               (86672.994280842890,915971.06605762441)
  zdot wrong                (86672.993582718176,915971.03870635165)               (86672.994280842890,915971.06605762441)

The following Dockerfile can be used to reproduce the problem

FROM fedora:36@sha256:dfb26a5dbc30de897f86e60870a4b4000c7733545e866fd1c03637a931b2d4e3

# Install packaging tools
RUN dnf install -y \
    git fedpkg fedora-packager curl

# Install build dependencies
RUN dnf install -y \
    make automake libtool dos2unix gcc-c++ gcc-gfortran hwloc-devel lapack-devel libibverbs-devel \
    mpich-devel openblas-devel openmpi-devel scalapack-mpich-devel scalapack-openmpi-devel flexiblas-devel

CMD ["/bin/bash"]

To reproduce the error, build and enter the container

docker build -t ga:latest .
docker run -it --rm --name ga ga:latest
curl -LO https://github.com/GlobalArrays/ga/releases/download/v5.8.1/ga-5.8.1.tar.gz
tar zxf ga-5.8.1.tar.gz
cd ga-5.8.1
module load mpi/mpich-x86_64
./configure '--with-scalapack4=-lscalapack' '--with-blas4=-lflexiblas' '--enable-shared' '--enable-static' '--enable-cxx' '--enable-f77' 'CC=gcc' 'LIBS=-lscalapack -lflexiblas' 'CXX=g++' 'FFLAGS=-O2 -I/usr/lib64/gfortran/modules' && make && make NPROCS=2 TESTS="global/testing/test.x" check-TESTS VERBOSE=1

'FFLAGS=-O2 -I/usr/lib64/gfortran/modules'

Redefining FFLAGS is a recipe for disaster and likely to be the culprit here.

Why do you need FFLAGS to be set instead of having autoconf to set its value?

The following works for me on the Fedora 36 image

./configure '--with-scalapack4=-lscalapack' '--with-blas4=-lopenblas' '--enable-shared' '--enable-static' '--enable-cxx' '--enable-f77' 'CC=gcc' 'LIBS=-lscalapack -lopenblas' 'CXX=g++'
make ;make NPROCS=2 TESTS=global/testing/test.x check-TESTS VERBOSE=1

If I set 'FFLAGS=-O2 -I/usr/lib64/gfortran/modules', then the zdot error shows up

fedora builders on koji export the CFLAGS, CXXFLAGS, FFLAGS, FCFLAGS, LDFLAGS, LT_SYS_LIBRARY_PATH and some more variables at the beginning of the %build section when building an rpm.

https://src.fedoraproject.org/rpms/redhat-rpm-config/c/0d162176e9dba1adc330a9ee561b91c8e5e62cb5, a 4 years old commit, the latest I was able to found, talking about the usage of those flags says It is possible to set RPM macros to change some aspects of the compiler flags. Changing these flags should be used as a last recourse if other workarounds are not available. I remember in the past those statements were stronger, and I think since the flags are exported by default, we should treat a divergence from them as an exception.

As mentioned in the bugzilla issue I switched to -O1 instead of the default -O2. If the "zdot wrong" is due to a single flag going from -O1 to -O2 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) finding that flag may provide some information. It may be possible to find the problematic flag with the Dockerfile below docker build --build-arg extra_flags="-falign-functions" -t ga:latest ..

FROM fedora:36@sha256:dfb26a5dbc30de897f86e60870a4b4000c7733545e866fd1c03637a931b2d4e3

# Install packaging tools
RUN dnf install -y \
    git fedpkg fedora-packager curl

# Install build dependencies
RUN dnf install -y \
    make automake libtool dos2unix gcc-c++ gcc-gfortran hwloc-devel lapack-devel libibverbs-devel \
    mpich-devel openblas-devel openmpi-devel scalapack-mpich-devel scalapack-openmpi-devel flexiblas-devel

# Prepare the build
RUN curl -LO https://github.com/GlobalArrays/ga/releases/download/v5.8.1/ga-5.8.1.tar.gz \
    && tar zxf ga-5.8.1.tar.gz

# Extra compilation flags
ARG extra_flags=""

# Compile and test
RUN . /etc/profile.d/modules.sh \
    && module load mpi/mpich-x86_64 \
    && cd ga-5.8.1 \
    && ./configure '--with-scalapack4=-lscalapack' '--with-blas4=-lflexiblas' '--enable-shared' '--enable-static' '--enable-cxx' '--enable-f77' 'CC=gcc' \
    'LIBS=-lscalapack -lflexiblas' 'CXX=g++' "FFLAGS=-O1 ${extra_flags} -I/usr/lib64/gfortran/modules" \
    && make \
    && make NPROCS=2 TESTS="global/testing/test.x" check-TESTS VERBOSE=1

CMD ["/bin/bash"]

Is this failure occurring only on Fedora 36 with GNU compilers version 12? What about earlier Fedora versions with earlier GNU compiler versions?

Adding -fno-tree-vectorize seems to do the trick.
As reported in https://gcc.gnu.org/gcc-12/changes.html in the "General Improvements" section, Vectorization is enabled at -O2.

Here is my configuration line

./configure '--with-scalapack=-lscalapack' '--with-blas4=-lflexiblas'   FFLAGS="-O2 -g -I/usr/lib64/gfortran/modules -fno-tree-vectorize"  CC=gcc CXX=g++ FC=gfortran

If this is confirmed, we might be able to wire -fno-tree-vectorize inside GA

Is this failure occurring only on Fedora 36 with GNU compilers version 12? What about earlier Fedora versions with earlier GNU compiler versions?

It's only from gfortran-12. fedora 35 has gfortran-11 and does not produce "zdot wrong" with the default %optflags.