jzbontar/mc-cnn

update torch problem occured

Sarah20187 opened this issue · 3 comments

After update my torch use ./update.sh, I run the command below, and got the following error:
th ./main.lua mb slow -a predict -net_fname net/net_mb_slow_-a_train_all.t7 -left ../data/md/2005_2006/Wood2/view1.png -right ../data/md/2005_2006/Wood2/view5.png -disp_max 70

cudnnFindConvolutionForwardAlgorithm failed: 4 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA2,1,1110,1306 -filtA112,1,3,3 2,112,1110,1306 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT
/home/fzehua/torch/install/bin/luajit: /home/fzehua/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes: convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA2,1,1110,1306 -filtA112,1,3,3 2,112,1110,1306 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT
stack traceback:
[C]: in function 'error'
/home/fzehua/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function 'updateOutput'
./main.lua:911: in function 'forward_free'
./main.lua:962: in function 'stereo_predict'
./main.lua:1101: in main chunk
[C]: in function 'dofile'
...ehua/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Did you update CUDNN to 5.1 also?

You can try disabling the cudnn optimization. Change the flag cudnn.benchmark=True in main.lua to False and see if that fixes the problem.

cudnn.benchmark true is why FindConvolutionForwardAlgorithm is being called.

I had similar problem and I think it is a memory issue.

Thank you for your reply! Finally I found this problem occurs because someone upgrade the cudnn to 5.0 on server. I fixed this problem by add dependencies in my home directory

@zhFuECL Hi, I am also working on this code for image dense matching. Can we talk about it a little bit? Here is my QQ: 1376519063