qwopqwop200/GPTQ-for-LLaMa

fastest-inference-4bit fails to build

lee-b opened this issue · 3 comments

lee-b commented

Fails with "quant_cuda_kernel.cu(259): error: no instance of overloaded function "atomicAdd" matches the argument list".

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/bin/nvcc -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/TH -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/THC -I/home/lb/.pyenv/versions/textgenwebui/include -I/home/lb/.pyenv/versions/3.9.16/include/python3.9 -c -c /home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_cuda_kernel.cu -o /home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-3.9/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c++17
FAILED: /home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-3.9/quant_cuda_kernel.o
/usr/bin/nvcc -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/TH -I/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/THC -I/home/lb/.pyenv/versions/textgenwebui/include -I/home/lb/.pyenv/versions/3.9.16/include/python3.9 -c -c /home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_cuda_kernel.cu -o /home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-3.9/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c++17
/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]"
/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/c10/core/TensorImpl.h(77): here

/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]"
/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/include/ATen/core/qualified_name.h(73): here

/home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_cuda_kernel.cu(259): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (__half *, c10::Half)
detected during instantiation of "void VecQuant4MatMulKernel_G(const half2 *, const int *, scalar_t *, const scalar_t *, const int *, const int *, int, int, int, int, int) [with scalar_t=c10::Half]"
(166): here

1 error detected in the compilation of "/home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_cuda_kernel.cu".
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/lb/GIT/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in
setup(
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/home/lb/.pyenv/versions/3.9.16/lib/python3.9/distutils/command/build_ext.py", line 529, in build_extension
objects = self.compiler.compile(sources,
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/home/lb/.pyenv/versions/textgenwebui/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

You will have to fix this manually..

#ifdef __CUDA_ARCH__
#if __CUDA_ARCH__ < 700
// adapted from https://github.com/torch/cutorch/blob/master/lib/THC/THCAtomics.cuh
__device__ __forceinline__ void atomicAdd(__half* address, c10::Half val) {
    unsigned int *address_as_ui = reinterpret_cast<unsigned int *>(reinterpret_cast<char *>(address) - (reinterpret_cast<size_t>(address) & 2));
    unsigned int old = *address_as_ui;
    unsigned int assumed;

    do {
        assumed = old;
        unsigned short hsum = reinterpret_cast<size_t>(address) & 2 ? (old >> 16) : (old & 0xffff);
        hsum += val;
        old = reinterpret_cast<size_t>(address) & 2
                 ? (old & 0xffff) | (hsum << 16)
                 : (old & 0xffff0000) | hsum;
        old = atomicCAS(address_as_ui, assumed, old);

    // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN)
    } while (assumed != old);
}
#endif
#endif

Goes into the kernel.cu file... but I am still getting stuck at triton. Trying to build it from source.. I thought this was going to inference with cuda, lol.

lee-b commented

This fixes it for my build at least, thanks.

Yup.. I can't get triton running though. It gets stuck trying to compile the [triton] kernel on the optimizations. So models load just fine but no inference.