mjiUST/SurfaceNet

Cant works on theano 1,theano.sandbox.cuda.dnn is discarded in new version

cdb0y511 opened this issue · 7 comments

Could you update your source file layer.py?
Because theano.sandbox.cuda.dnn is discarded in theano 1(>theano 0.9).
from theano.sandbox.cuda.dnn import gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW wont work, and if lasagne.utils.theano.sandbox.cuda.dnn_available() in similarityNet.py.
Could you use theano.gpuarray.dnn instead?
I cant replace gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW with classes of theano.gpuarray.dnn by myself.
And I cant backwards to theano 0.9 either, due to the new vision of cudnn does not support old theano and pygpu.
plz help me,thanks

Nice work for 3D reconstructon! I have some simliar issues here.

@mjiUST Could you give us some tips to make the code running on a newer system?
My system is:

  • Ubuntu 16.04.2 LTS (amd64)
  • CUDA 8.0 / 9.1, cuDNN 7.1 (edit: I installed cuDNN 5.1 instead)

Or:

Do you have suggestions for running/training without cuDNN?

I observed there are some if-branch, like in similarityNet.py:

if lasagne.utils.theano.sandbox.cuda.dnn_available(): # when cuDNN available
    from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer 
else:
    from lasagne.layers import Conv2DLayer as ConvLayer

But in layers.py and SurfaceNet.py, some cudnn functions are hardcoded

  • from lasagne.layers.dnn import Conv3DDNNLayer, Pool3DDNNLayer
  • from theano.sandbox.cuda.dnn import gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW

Following the same logic in the if-branch, maybe for Conv3DDNNLayer and Pool3DDNNLayer:

I might be able to hack it to:

from lasagne.layers import Conv3DLayer as Conv3DDNNLayer
from lasagne.layers import Pool3DLayer as Pool3DDNNLayer

But for other functions like gpu_contiguous, I haven't found any functions to replace so far. If you got any suggestion, please let us know! Thanks!

@cdb0y511 How are things going with you?

Dear @cdb0y511 @Rubikplayer ,

Thanks for the issue report. I specified the older Theano version

conda install -c rdonnelly theano -y # 0.9.0 version theano

Since the 3D dilated conv layer was implemented using some APIs in CUDNN, I'm not sure whether we could easily discard CUDNN.

If you are worried about that the installation may affect your existing packages' version. Please feel free to use the SurfaceNet/installEnv.sh, that will not change anything of your existing python, theano, and ~/.bashrc. What you need to do is to specify the CUDA/CUDNN pathes accordingly. Please refer to the updated README.

Hope this may help.

@mjiUST
Thanks a lot. And well done. I am a Ph.D. candidate too. Maybe we can disscuss about your work one day.
but frist , I want to figure out how it works.
I have read the installEnv.sh. And I totally understand how to use conda and install specified theano 0.9( even your scrpits install latest theano).
You dont need to discatd CUDNN.
The problem is theano.sandbox is an old back end. You'd better switch to a new backend theano.gpuarray. pls see https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray)
Otherwise new drivers and new cuda may not compatible with it. I know you use the nvidia driver 375, cuda 8.0, cudnn v5.1. But I need cuda 9.0 and cudnnv7.1.1 for tensorflow1.6. So the latest
nvidia driver has been installed.

Even I use theano 0.9.
Exception: ('The following error happened while compiling the node', <theano.sandbox.cuda.DnnVersion object at 0x7f9028151110>(), '\n', 'The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads')

The only way is switching to a new backend theano.gpuarray. Or give up cuda 9.0 and cudnnv7.1.1. Go back to nvidia driver 375, cuda 8.0, cudnn v5.1. Its hard to choose. And it certainly limits your work.

@Rubikplayer I cant find gpu_contiguous too,even in theano 0.9's doucuments. So I guess only the original author can fix it.

@cdb0y511
Thanks for your interest and looking forward to having further discussion.

I don't know whether you have tried this method: say you have both /usr/local/cuda-8.0 and /usr/local/cuda that linked to cuda-9.0. Change the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cuda.sh to export CUDA_ROOT=/usr/local/cuda-8.0 which will not affact your settings in .bashrc before you source activate SurfaceNet. In this way, even though you may have multiple cuda versions in your PC, a particular one could be specified without ANY influence with your other projects (for example, tensorflow and pytorch).

Similarly, one can also specify a cudnn without influence with other projects by changing the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cudnn.sh to any path where the cudnn folder located, e.g., export CUDNN_ROOT=/home/<user-name>/libs/cudnn-8.0-v5.1.

I highly recommend you install CUDNN outside of CUDA folder, so that you can have any combination of CUDA+CUDNN by defining specific environment variables in different conda_envs.

Please feel free to post any queries.

@mjiUST @cdb0y511
Yes, yesterday I did the following, and it can start running the main.py (although some other error occurs):

  • Install CuDNN 5.1 (as you mentioned in "install outside cuda folder")
  • Install theano 0.9, by conda install theano=0.9
  • Specify CUDA version, by exporting environment variable
export CUDA_ROOT=/usr/local/cuda-8.0
export PATH=$PATH:$CUDA_ROOT/binexport 
export LD_LIBRARY_PATH=$CUDA_ROOT/lib64:$LD_LIBRARY_PATH
export CPATH=$CUDA_ROOT/include:$CPATH
export LIBRARY_PATH=$CUDA_ROOT/lib64:$LIBRARY_PATH

and setting theano config in ~/.theanorc:

[cuda] 
root=/usr/local/cuda-8.0

@cdb0y511 You can also have a try. I have multiple CUDA installed. Also I installed two versions of CuDNN (although I might have overwritten 7.1 with 5.1).

For the error I encountered, I will open another issue. Thanks for the feedback!
Edit: new issue opened: (#4)

@Rubikplayer
Thank you for the feedback. To be precise,

  • before we specify outside cudnn, the original one should be removed OR unlinked (remove from the env variables: LD_LIBRARY_PATH, CPATH, and LIBRARY_PATH)

  • installation of 0.9 version Theano please use the command:

    conda install -c rdonnelly theano -y # 0.9.0 version theano
    Since the one you mentioned conda install theano=0.9 will result in 0.9 version with different commit hash.

@mjiUST Thanks for the response.

  • Yes, as I found in another thread, indeed different versions of CuDNN can result in errors.
  • Thanks for the info! It seems the conda-installed version is okay for now. If any problem, I will switch back to the version you specified.