Error running with rtx3080 graphics card

Question

Error running with rtx3080 graphics card

Opened this issue 3 years ago · 16 comments

RuntimeError: cuDNN error:CUDNN_STATUS_MAPPING_ERROR
Rtx3080
cuda:10.0
pytorch:1.0.0
cudnn:7.3.5
`Traceback (most recent call last):
File "./tools/train.py", line 256, in
main()
File "./tools/train.py", line 154, in main
pred_r, pred_t, pred_c, emb = estimator(img, points, choose, idx)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 96, in forward
out_img = self.cnn(img)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 36, in forward
x = self.model(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/pspnet.py", line 65, in forward
f, class_f = self.feats(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/Object-RPE-master/DenseFusion/lib/extractors.py", line 115, in forward
x = self.conv1(x)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)

Is that the reason for my graphics card？
I tried to install cuda11 1,pytorch1. 8,cudnn8. 0.5, but it will appear RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method..
How to solve this problem?

Answer 1 · 2022-01-12T02:48:10.000Z

Hey, dude , as i know ,the rtx30s just only support cuda>=11.1 .

Answer 2 · 2022-03-18T11:10:56.000Z

hey @Xushuangyin the issue is with knn use of deprecated autograd like mentionned in the last error message. I used this pull request and it worked pull request : #170

Answer 3 · 2022-04-14T02:19:41.000Z

@Xushuangyin
Were you able to solve this issue?

Answer 4 · 2022-04-14T02:47:09.000Z

Hello, I didn't solve this problem in the end. I can't find a KNN_ pytorch suitable for rtx30s series.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月14日(星期四) 上午10:19 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) @Xushuangyin Were you able to solve this issue? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 5 · 2022-04-14T02:49:34.000Z

I would like to ask if I use the LINEMOD data set to train the attitude estimation model, and call the model in real environment to use the camera to pose the real time pose of the object in the dataset. Can we achieve the desired result? thank you！

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月14日(星期四) 上午10:19 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) @Xushuangyin Were you able to solve this issue? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 6 · 2022-04-14T05:56:39.000Z

@Xushuangyin
I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git
modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!

Answer 7 · 2022-04-14T06:06:52.000Z

I'll try it later. It's strange that the dataset I made can be trained, and part of the linemod dataset can also be trained, but the whole dataset can't be trained. Thank you.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月14日(星期四) 中午1:56 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) @Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series. Steps: git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine) Hope it helps! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 8 · 2022-04-15T00:12:52.000Z

@Xushuangyin
Hello. Would it be possible for you to upload your DenseFusion project to your GitHub repository?
I would love to see how you have made your own dataset work.
Thank you in advance.

Answer 9 · 2022-04-15T11:32:36.000Z

This is a link to the method I used to make the datasets.
https://github.com/F2Wang/ObjectDatasetTools @jc0725

Answer 10 · 2022-04-15T13:52:31.000Z

hi @Xushuangyin , can you details us a bit what you modified in the code in order to train on your custom dataset.
Did you resize the images ? changed the num_points ? I noticed the loop doesnt load all objects, it skips object 7 ?

I am having shapes issues ..

ValueError: operands could not be broadcast together with shapes (540,960,4) (3,)

Answer 11 · 2022-04-15T14:02:27.000Z

Do you use objectdatasettools to create datasets? I seem to have encountered this problem before. I'm sorry I forgot how to solve it. You can try to modify the shape of the image. It should be that your image is 4 channels. You need to change it to 3 channels before matrix multiplication.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月15日(星期五) 晚上9:52 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) hi @Xushuangyin , can you details us a bit what you modified in the code in order to train on your custom dataset. Did you resize the images ? changed the num_points ? I noticed the loop doesnt load all objects, it skips object 7 ? I am having shapes issues .. ValueError: operands could not be broadcast together with shapes (540,960,4) (3,) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 12 · 2022-04-17T02:37:08.000Z

I modified the file, but I still reported an error when evaluating the linemod model. Can you provide me with the specific code you modified? What are the specific modifications? Thank you.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月14日(星期四) 中午1:56 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) @Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series. Steps: git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine) Hope it helps! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 13 · 2022-04-17T22:12:56.000Z

hey i resized the image like you said and i didnt get the valueErro anymore.
@jc0725 do you know what the values of num_points, num_pt_mesh_large and num_pt_mesh_small ?

I have models of my objects and some of them have less than 100 vetrices. Is num_pt_mesh_small the minimum number of vertices ?

I currently have a shape issue nd i think its related with the num_points :

Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 2-dimensional input of size [1, 1] instead

Answer 14 · 2022-04-19T12:21:46.000Z

When training your own datasets, I didn't modify the parameters you said, which is consistent with the linemod datasets.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月18日(星期一) 上午6:13 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205) hey i resized the image like you said and i didnt get the valueErro anymore. @Xushuangyin do you know what the values of num_points, num_pt_mesh_large and num_pt_mesh_small ? I have models of my objects and some of them have less than 100 vetrices. Is num_pt_mesh_small the minimum number of vertices ? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 15 · 2022-07-24T14:11:48.000Z

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git

modify files and follow the terminal code as shown in Pytorch 1.6 and lib knn build with cuda 10.2 #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!
Hello! @jc0725 I try pytorch=1.8.0, torchvision=0.9.0, cuda=11.1 on the rtx30s series, and I also follow the steps as #170. However, when I try to train the LINEMOD dataset, I face the problem "ImportError: /home/chenkai/code/DenseFusion-Pytorch/lib/knn/knn_pytorch.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_10E". Have you faced the problem? Thank you very much!

Answer 16 · 2022-09-20T09:23:38.000Z

RTX 3090 also has the same problem.....