thu-coai/DA-Transformer

Compiled Failed

Opened this issue · 6 comments

  • python 3.7.12
  • pytorch 1.11.0+cu102
  • gcc 5.4

I have modified the cloneable.h file according to the FAQs section, but I still encounter the following error when the program is running. Please tell me how can i fix it?

 
Traceback (most recent call last):  
File /home/env/nat/lib/python3.7/site-packages/torch/utils/cpp_extension.py, line 1746, in _run_ninja_build   env=env)
File /home/env/nat/lib/python3.7/subprocess.py, line 512, in run   output=stdout, stderr=stderr)  subprocess.CalledProcessError: Command [ninja, -v] returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

RuntimeError: Error building extension 'dag_loss_fn': [1/2] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o 
FAILED: logsoftmax_gather.cuda.o 

/usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o 
/home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu:31:23: fatal error: cub/cub.cuh: No such file or directory compilation terminated.
ninja: build stopped: subcommand failed

It seems that PyTorch1.11 removes cub from the default including directory. A direct workaround is using Pytorch1.10.

I am trying to include cub in pytorch1.11 and will update this issue if I find an solution.

It seems that PyTorch1.11 removes cub from the default including directory. A direct workaround is using Pytorch1.10.

I am trying to include cub in pytorch1.11 and will update this issue if I find an solution.

I try to reinstall pytorch1.10.1, but it doesn't work (T⌓T)

I am trying to reproduce your environment... it may take some times before I can find a solution.

If possible, you can also try using cuda>=11.0.
Or just skip cuda compiling by adding the following arguments:

--torch-dag-loss                  # Use torch implementation for dag loss instead cuda implementation. It may become slower and consume more memory.
--torch-dag-best-alignment        # Use torch implementation for best-alignment instead cuda implementation. It may become slower and consume more memory.
--torch-dag-logsoftmax-gather     # Use torch implementation for logsoftmax-gather instead cuda implementation. It may become slower and consume more memory.

I am trying to reproduce your environment... it may take some times before I can find a solution.

If possible, you can also try using cuda>=11.0. Or just skip cuda compiling by adding the following arguments:

--torch-dag-loss                  # Use torch implementation for dag loss instead cuda implementation. It may become slower and consume more memory.
--torch-dag-best-alignment        # Use torch implementation for best-alignment instead cuda implementation. It may become slower and consume more memory.
--torch-dag-logsoftmax-gather     # Use torch implementation for logsoftmax-gather instead cuda implementation. It may become slower and consume more memory.

Okay, thanks!

@jchang98 I have pushed an update which manuanlly includes the cub library. Please re-clone this repo and try again.

@sudanl Can you run the script with only one GPU (single process)?
It only says that the Cuda program is not correctly compiled but not shows the real errors.