j96w/DenseFusion

Error running with rtx3080 graphics card

Opened this issue · 16 comments

RuntimeError: cuDNN error:CUDNN_STATUS_MAPPING_ERROR
Rtx3080
cuda:10.0
pytorch:1.0.0
cudnn:7.3.5
`Traceback (most recent call last):
File "./tools/train.py", line 256, in
main()
File "./tools/train.py", line 154, in main
pred_r, pred_t, pred_c, emb = estimator(img, points, choose, idx)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 96, in forward
out_img = self.cnn(img)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 36, in forward
x = self.model(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/pspnet.py", line 65, in forward
f, class_f = self.feats(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/extractors.py", line 115, in forward
x = self.conv1(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)

Is that the reason for my graphics card?
I tried to install cuda11 1,pytorch1. 8,cudnn8. 0.5, but it will appear RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method..
How to solve this problem?

Hey, dude , as i know ,the rtx30s just only support cuda>=11.1 .

hey @Xushuangyin the issue is with knn use of deprecated autograd like mentionned in the last error message. I used this pull request and it worked pull request : #170

@Xushuangyin
Were you able to solve this issue?

@Xushuangyin
I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

  1. git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git
  2. modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!

@Xushuangyin
Hello. Would it be possible for you to upload your DenseFusion project to your GitHub repository?
I would love to see how you have made your own dataset work.
Thank you in advance.

This is a link to the method I used to make the datasets.
https://github.com/F2Wang/ObjectDatasetTools @jc0725

hi @Xushuangyin , can you details us a bit what you modified in the code in order to train on your custom dataset.
Did you resize the images ? changed the num_points ? I noticed the loop doesnt load all objects, it skips object 7 ?

I am having shapes issues ..

ValueError: operands could not be broadcast together with shapes (540,960,4) (3,)

hey i resized the image like you said and i didnt get the valueErro anymore.
@jc0725 do you know what the values of num_points, num_pt_mesh_large and num_pt_mesh_small ?

I have models of my objects and some of them have less than 100 vetrices. Is num_pt_mesh_small the minimum number of vertices ?

I currently have a shape issue nd i think its related with the num_points :

Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 2-dimensional input of size [1, 1] instead

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

  1. git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git
  2. modify files and follow the terminal code as shown in Pytorch 1.6 and lib knn build with cuda 10.2 #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!
Hello! @jc0725 I try pytorch=1.8.0, torchvision=0.9.0, cuda=11.1 on the rtx30s series, and I also follow the steps as #170. However, when I try to train the LINEMOD dataset, I face the problem "ImportError: /home/chenkai/code/DenseFusion-Pytorch/lib/knn/knn_pytorch.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_10E". Have you faced the problem? Thank you very much!

RTX 3090 also has the same problem.....