uzh-rpg/rpg_vid2e

Segmentation fault error

Closed this issue · 7 comments

When I run the last command, I meet the error:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THC/THCGeneral.cpp line=371 error=98 : invalid device function
Traceback (most recent call last):
  File "esim_torch/generate_events.py", line 72, in <module>
    process_dir(output_folder, path, args)
  File "esim_torch/generate_events.py", line 40, in process_dir
    sub_events = esim.forward(log_image, timestamp_ns)
  File "/code/rpg_vid2e/esim_torch/esim_torch.py", line 49, in forward
    events = self.initialized_forward(images, timestamps)
  File "/code/rpg_vid2e/esim_torch/esim_torch.py", line 73, in initialized_forward
    cumsum = event_counts.view(-1).cumsum(dim=0)
RuntimeError: cuda runtime error (98) : invalid device function at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THC/THCGeneral.cpp:371
  0%|                                                      | 0/56 [00:00<?, ?it/s]
Segmentation fault (core dumped)

What's the reason?

Can you report on your cuda driver and cuda toolkit version?

You have to ensure that the cuda toolkit is supported by the cuda driver version on your system.

Hi, I tested the cuda driver and cudatoolkit version.
nvcc -V report that cudatoolkit version is 10.0 and nvidia-smi report that cuda driver version is 10.1. Is this problem?
I have searched that if cuda driver version(10.1) is greater than cudatoolkit version(10.0). Most should not go wrong.

Yes, that should be no problem. Have you tried compiling pytorch from source before installing esim_torch?

Hi DachunKai,

I updated the setup.py in the installation of esim_torch in the most recent commit. Can you reinstall esim_torch and try again?

Hi, I tried, but also failed when pip install esim_torch/

Processing ./esim_torch
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: esim-cuda
  Building wheel for esim-cuda (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/code/rpg_vid2e/esim_torch/setup.py'"'"'; __file__='"'"'/code/rpg_vid2e/esim_torch/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-7xt8hzwy
       cwd: /code/rpg_vid2e/esim_torch/
  Complete output (9 lines):
  running bdist_wheel
  /home/liam/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:370: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_ext
  building 'esim_cuda' extension
  :/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda/bin/nvcc -I/home/liam/.local/lib/python3.8/site-packages/torch/include -I/home/liam/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/liam/.local/lib/python3.8/site-packages/torch/include/TH -I/home/liam/.local/lib/python3.8/site-packages/torch/include/THC -I:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda/include -I/opt/anaconda3/include/python3.8 -c esim_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/esim_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=esim_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++14
  unable to execute ':/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda/bin/nvcc': No such file or directory
  error: command ':/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda:/usr/local/cuda/bin/nvcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for esim-cuda
  Running setup.py clean for esim-cuda
Failed to build esim-cuda

Add cuda path in ~/.bashrc or ~/.zshrc with "export CUDA_HOME=/usr/local/cuda" can solve problem.