pytorch/FBGEMM

GCC: compilation of the AVX512 commands fails (Werror=uninitialized)

bulvara opened this issue · 11 comments

Dear devs,
I'm using ArchLinux (gcc=default) on AMD Ryzen, so I don't even need AVX512 commands.

Here is an example error message with GCC:

In Funktion »__m512i _mm512_inserti64x4(__m512i, __m256i, int)«,
    eingefügt von »__m512i fbgemm::internal::permute_row(__m512i)« bei ./pytorch/third_party/fbgemm/src/UtilsAvx512.cc:606:28,
    eingefügt von »void fbgemm::internal::core_transpose_16x32_block_i8(__m512i*, __m512i*)« bei ./pytorch/third_party/fbgemm/src/UtilsAvx512.cc:665:21,
    eingefügt von »void fbgemm::internal::transpose_16x32_block(const uint8_t*, int, uint8_t*, int, int, int) [mit bool MREM = false; bool NREM = true]« bei ./pytorch/third_party/fbgemm/src/UtilsAvx512.cc:937:32:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.0.1/include/avx512fintrin.h:6137:10: Fehler: »__Y« ist nicht initialisiert, wird aber benutzt [-Werror=uninitialized]
 6137 |   return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 6138 |                                                     (__v4di) __B,
      |                                                     ~~~~~~~~~~~~~
 6139 |                                                     __imm,
      |                                                     ~~~~~~
 6140 |                                                     (__v8di)
      |                                                     ~~~~~~~~
 6141 |                                                     _mm512_undefined_epi32 (),
      |                                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~
 6142 |                                                     (__mmask8) -1);
      |                                                     ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.0.1/include/avx512fintrin.h: In Funktion »void fbgemm::internal::transpose_16x32_block(const uint8_t*, int, uint8_t*, int, int, int) [mit bool MREM = false; bool NREM = true]«:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.0.1/include/avx512fintrin.h:206:11: Anmerkung: »__Y« wurde hier deklariert
  206 |   __m512i __Y = __Y;
      |           ^~~

When using Clang instead the configure step fails because AVX512 intrinsics are disabled when compiling with -march=native on my platform.

Is there any way to completely patch out AVX512?

Could you share the build command? Currently FBGEMM (CPU) primarily supports Intel CPUs. For AMD CPUs (e.g., Ryzen with only AVX2 supports), we can possibly remove those fbgemm_avx512 build (e.g.,

target_compile_options(fbgemm_avx512 PRIVATE
).

I'm building pytorch after https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD

Is setup.py using BUILD.bazel or CMakeLists.txt ? I've tried to patch the bazel file, but a myriad of deletions is necessary there.

I am also seeing this error. I have an Intel CPU.

@Bidski : do you have more information on the Intel CPU and the compiler you have? We currently require a Intel CPU after Broadwell (AVX2) and a compiler supporting AVX512.

@jianyuh my apologies, it's been a long day and I forgot I was working on a remote PC 😟

I am actually experiencing this issue on an AMD CPU. Is it possible to build this on an AMD CPU?

I will test further on an Intel CPU tomorrow to confirm that it isn't affected.

I'm seeing this on fedora 36 gcc-12.1

Should work with:

  export CFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
  export CXXFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"

@jianyuh Intel CPU build seems fine, sorry for the alarm

The CFLAGS and CXXFLAGS suggested by @kgizdov do allow the build to succeed on the AMD CPU.

This is a GCC regression. Here's the bug for reference.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

@bulvara did the above fix regarding CFLAGS and CXXFLAGS work for you?

This was a saving thread for me, thank you. Although I had to remember that simply re-running cmake (at the pytorch build level in my case) with a different CXXFLAGS in my environment does not reconfigure with a different CXXFLAGS. I had to make sure to clean everything first before re-running cmake