NVIDIA/nccl

error: parameter packs not expanded with '...'

bisqwit opened this issue · 11 comments

Attempting to build nccl on Debian testing produces dozens of these errors:

    /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
      530 |         operator=(_Functor&& __f)
          |                                                                                                                                                  ^ 
    /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’

make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:84: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_f32.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:89: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_f64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:39: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_f32.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:24: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_i64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:69: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_i64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:74: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_u64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:54: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_u8.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:19: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_u32.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:29: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_u64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:9: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_u8.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:99: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_min_u8.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:79: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_f16.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:34: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_f16.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:59: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_prod_i32.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:44: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_f64.o] Error 1
make[5]: *** [/usr/local/src/nccl/build/obj/collectives/device/Makefile.rules:4: /usr/local/src/nccl/build/obj/collectives/device/sendrecv_sum_i8.o] Error 1
make[5]: Leaving directory '/usr/local/src/nccl/src/collectives/device'
make[4]: *** [Makefile:50: /usr/local/src/nccl/build/obj/collectives/device/colldevice.a] Error 2
make[4]: Leaving directory '/usr/local/src/nccl/src'
make[3]: *** [Makefile:25: src.build] Error 2
make[3]: Leaving directory '/usr/local/src/nccl'
make[2]: *** [Makefile:26: build] Error 2
make[2]: Leaving directory '/usr/local/src/nccl/pkg/debian'
make[1]: *** [Makefile:23: debian.build] Error 2
make[1]: Leaving directory '/usr/local/src/nccl/pkg'
make: *** [Makefile:28: pkg.debian.build] Error 2

Same goes for allreduce (tried this first with the version included with pytorch).

$ nvcc  --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

$ g++ --version
g++ (Debian 11.2.0-16) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux bookworm/sid
Release:        testing
Codename:       bookworm

install gcc of version 10 helps me:
sudo apt install gcc-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10
sudo update-alternatives --config gcc

choose 10 version while compiling, i also set g++ to 10

I forgot to write what after changing version i removed all files in build directory and begin with start cmake, then make

That does not help, at least not in Debian.
However, what does help is making the following two changes in /usr/include/c++/*/bits/std_function.h :

Line 433+ (approximate):

  template<typename _Functor,
           typename _Constraints = _Requires<_Callable<_Functor>>>
    function(_Functor&& __f)
    //noexcept(_Handler<_Functor>::template _S_nothrow_init<_Functor>()) // CUDA BOTCHES THIS
    : _Function_base()

Line 529+ (approximate):

  template<typename _Functor>
    _Requires<_Callable<_Functor>, function&>
    operator=(_Functor&& __f)
    //noexcept(_Handler<_Functor>::template _S_nothrow_init<_Functor>()) // CUDA BOTCHES THIS
    {
      function(std::forward<_Functor>(__f)).swap(*this);
      return *this;
    }

Commenting out the indicated part solves the compilation. For some reason, NVCC botches the compilation when that part is present. It preprocesses the C++ code and erroneously changes the template signature in a way that does not and can not compile.

Hi,
I ran into the same problem as you.
From what I can tell, @bisqwit identified the issue correctly (my header files are different but contain the same lines that throw the same errors), though manipulating the package installed header files wasn't a good solution in my case.

Switching to gcc/++-10 did eventually fix it for me (ubuntu22.04 in this case), since the header files seem to not contain the problematic lines (problematic for nvcc). nvcc seems to NOT respect the g++ setup by update-alternatives (or I made a setup error) properly, since it continued to use the c++-11 header files even after switching to c++-10 via update-alternatives.

Manually setting the environment variable for the CXX, C++ compilers worked though, e.g.:
make -j src.build CC=gcc-10 CXX=g++-10

I was having this same error in the tiny-cuda-nn portion of instant-nerf... thank you @bisqwit

@GenosW Switching to gcc 10 worked perfectly for me. Thanks for the tip!

KYANJO commented

Line commenting out works like magic

I tried all the solutions but still getting error given below, any solution to this,

make CXX=nvcc CC=nvcc -C external
make[1]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external'
make -C zlib
make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib'
nvcc -c -o build/adler32.o ./adler32.c
nvcc -c -o build/crc32.o ./crc32.c
nvcc -c -o build/deflate.o ./deflate.c
nvcc -c -o build/infback.o ./infback.c
nvcc -c -o build/inffast.o ./inffast.c
nvcc -c -o build/inflate.o ./inflate.c
nvcc -c -o build/inftrees.o ./inftrees.c
nvcc -c -o build/trees.o ./trees.c
nvcc -c -o build/zutil.o ./zutil.c
nvcc -c -o build/compress.o ./compress.c
nvcc -c -o build/uncompr.o ./uncompr.c
nvcc -c -o build/gzclose.o ./gzclose.c
nvcc -c -o build/gzlib.o ./gzlib.c
nvcc -c -o build/gzread.o ./gzread.c
nvcc -c -o build/gzwrite.o ./gzwrite.c
ar rc build//libz.a ./build/adler32.o ./build/crc32.o ./build/deflate.o ./build/infback.o ./build/inffast.o ./build/inflate.o ./build/inftrees.o ./build/trees.o ./build/zutil.o ./build/compress.o ./build/uncompr.o ./build/gzclose.o ./build/gzlib.o ./build/gzread.o ./build/gzwrite.o
make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib'
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml'
nvcc -c tinystr.cpp -o build/tinystr.o
nvcc -c tinyxml.cpp -o build/tinyxml.o
nvcc -c tinyxmlerror.cpp -o build/tinyxmlerror.o
nvcc -c tinyxmlparser.cpp -o build/tinyxmlparser.o
ar rc build/libtinyxml.a ./build/tinystr.o ./build/tinyxml.o ./build/tinyxmlerror.o ./build/tinyxmlparser.o
make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml'
cp tinyxml/build/libtinyxml.a lib/
make[1]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external'
nvcc -O3 -std=c++17 --forward-unknown-to-host-compiler -pthread --forward-unknown-to-host-compiler -x cu -O3 -std=c++17 --generate-code=arch=compute_60,code=[compute_60,sm_60] --extended-lambda --expt-relaxed-constexpr -rdc=true -Xcudafe "--diag_suppress=implicit_return_from_non_void_function --display_error_number --diag_suppress=20014 --diag_suppress=20011" -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -DDEFAULT_FLOATING_POINT_TYPE=float -fPIC -Isrc/ -c src/communication/mpiManager.cpp -o src/communication/mpiManager.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make: *** [Makefile:46: src/communication/mpiManager.o] Error 1

same issue

I am closing this issue. This is not a problem in NCCL but, rather, an incompatibility between certain versions of CUDA and gcc.

Note: If you are using cmake the compiler setting is cached. This means that simply setting CC and CXX environment variables will not change anything unless you clear the caches.

Alternatively, you can invoke cmake like this:

cmake -D CMAKE_C_COMPILER=`which gcc-10` -D CMAKE_CXX_COMPILER=`which g++-10` .

which will invalidate the caches for you.

The error is due to the CUDA nvcc compiler potentially having different support for C++ standards compared to the host compiler (such as g++). If the code uses features from C++11 or a higher version, but the nvcc does not specify the corresponding C++ standard, this can lead to compilation errors.
try to use

nvcc --std=c++14 your_program.cu -o your_program