how to run smoke on RTX3070 GPU?

Question

how to run smoke on RTX3070 GPU?

zhaoxiaolong2020 opened this issue 4 years ago · 2 comments

zhaoxiaolong2020 commented 4 years ago

as i know smoke requirements
Ubuntu 16.04
Python 3.7
Pytorch 1.3.1
CUDA 10.0

but my PC is RTX3070 GPU，only support cuda11.1 and more up version ，So I try to configure the environment as follows：

(base) zxl@R9000P:~/mywork/MANA-AI/DCNv2_latest$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

ubuntu18.04
python 3.8
pytorch1.8.1和1.9.0
CUDA11.1

Run under PyTorch 1.8.1 and 1.9.0 environments：

python setup.py build develop

error as below

/home/zxl/mywork/MANA-AI/smoke_mono_3d/smoke/csrc/cuda/dcn_v2_cuda.cu(127): 
error: identifier "THCudaBlas_SgemmBatched" is undefined
/home/zxl/mywork/MANA-AI/smoke_mono_3d/smoke/csrc/cuda/dcn_v2_cuda.cu(275): error: identifier "THCudaBlas_Sgemm" is undefined
/home/zxl/mywork/MANA-AI/smoke_mono_3d/smoke/csrc/cuda/dcn_v2_cuda.cu(329): error: identifier "THCudaBlas_Sgemv" is undefined
3 errors detected in the compilation of "/home/zxl/mywork/MANA-AI/smoke_mono_3d/smoke/csrc/cuda/dcn_v2_cuda.cu".
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

The following methods are found online

git clone https://github.com/jinfagang/DCNv2_latest.git
cd DCNv2_latest
python setup.py build develop

CUDA 11.1 + PyTorch 1.8.1 or 1.9.0 can build successfully, but execute the test program

python testcuda.py

error as below

raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 0.4043,  0.0048, -0.0100,  ...,  0.0000,  0.0000,  0.0000],
       [ 0.1935,  0.0695,  0.0132,  ...,  0.0000,  0.0000,  0.0000],
       [-0.0009,  0.0000,  0.3827,  ...,  0.0000,  0.0000,  0.0000],
       ...,
       [ 0.0000,  0.0000,  0.0000,  ...,  0.0000, -0.0237,  0.0000],
       [ 0.0000,  0.0000,  0.0000,  ..., -0.4143, -0.8342,  0.0000],
       [ 0.0000,  0.0000,  0.0000,  ..., -0.2155, -0.1278, -0.1084]],
      device='cuda:0')
analytical:tensor([[ 0.4043,  0.0049, -0.0100,  ...,  0.0000,  0.0000,  0.0000],
       [ 0.1934,  0.0695,  0.0133,  ...,  0.0000,  0.0000,  0.0000],
       [-0.0011,  0.0000,  0.3829,  ...,  0.0000,  0.0000,  0.0000],
       ...,
       [ 0.0000,  0.0000,  0.0000,  ...,  0.0000, -0.0237,  0.0000],
       [ 0.0000,  0.0000,  0.0000,  ..., -0.4146, -0.8340,  0.0000],
       [ 0.0000,  0.0000,  0.0000,  ..., -0.2157, -0.1280, -0.1084]],
      device='cuda:0')

What puzzled me most was that the following instructions were run successfully in all three environments

(py38torch171-smoke) zxl@R9000P:~/mywork/MANA-AI/DCNv2_latest$ python
Python 3.8.10 (default, Jun  4 2021, 15:09:15) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.1'
>>> torch.cuda.is_available()
True
>>>

(py38torch181-smoke) zxl@R9000P:~/mywork/MANA-AI/DCNv2_la00m$ python
Python 3.8.10 (default, Jun  4 2021, 15:09:15) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.8.1+cu111'
>>> torch.cuda.is_available()
True
>>>

(py38torch190-smoke) zxl@R9000P:~/mywork/MANA-AI/DCNv2_latest$ python
Python 3.8.10 (default, Jun  4 2021, 15:09:15) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.9.0+cu111'
>>> torch.cuda.is_available()
True
>>>

I've been researching all day but have no success. Can you configure the environment on the RTX30 series graphics card CUDA 11.1? Thanks a lot!

Answer 1 · 2021-07-07T02:54:15.000Z

I have the same issue. Looking forward to the solution~

Answer 2 · 2021-08-02T07:58:25.000Z

I'm not tested on RTX30 series. but tested on cuda 11.0 environment!
please try this.
#56