CUDNN_STATUS_INTERNAL_ERROR while running main.py
Rubikplayer opened this issue · 4 comments
Hi thanks for previous feedback in another thread. After I setup up Cuda8.0/CuDNN 5.1 and theano 0.9, I can run some part of main.py
. But there's still some error when executing patch2embedding()
function in the early rejection stage.
More specifically:
Traceback (most recent call last):
File "./main.py", line 27, in <module>
save_npz_file_path = main_reconstruct.reconstruction(datasetFolder, _model, imgNamePattern, poseNamePattern, outputFolder, N_viewPairs4inference, resol, BB, viewList)
File "/home/ICT2000/tli/Workspace/SurfaceNet/main_reconstruct.py", line 77, in reconstruction
cubeCenter_hw = np.stack([img_h_cubesCenter, img_w_cubesCenter], axis=0)) # (N_cubes, N_views, D_embedding), (N_cubes, N_views)
File "./utils/earlyRejection.py", line 31, in patch2embedding
patches_embedding[:,:] = patch2embedding_fn(patch_allBlack)[0] # don't use np.repeat (out of memory)
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/compile/function_module.py", line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/compile/function_module.py", line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: error doing operation: CUDNN_STATUS_INTERNAL_ERROR
Apply node that caused the error: GpuDnnConv{algo='small', inplace=False}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='cross', precision='float32'}.0, Cast{float32}.0, Cast{float32}.0)
Toposort index: 276
Inputs types: [GpuArrayType<None>(float32, (False, False, False, False)), GpuArrayType<None>(float32, (False, False, False, False)), GpuArrayType<None>(float32, (False, False, False, False)), <theano.gof.type.CDataType object at 0x7fbd6848bc90>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(1, 3, 64, 64), (64, 3, 3, 3), (1, 64, 64, 64), 'No shapes', (), ()]
Inputs strides: [(49152, 16384, 256, 4), (108, 36, 12, 4), (1048576, 16384, 256, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fbb43bd10c0>, 1.0, 0.0]
Inputs type_num: [11, 11, 11, '', 11, 11]
Outputs clients: [[HostFromGpu(gpuarray)(GpuDnnConv{algo='small', inplace=False}.0)]]
Detail error log can be seen here:
err_log.txt
I have tried:
- Delete Theano cache
theano-cache purge
orrm -rf ./.theano
- Adjust CNMeM: (https://devtalk.nvidia.com/default/topic/950158/cnmem-limitations-when-using-cudnn/)
- Remove nvidia cache and reboot: (https://stackoverflow.com/questions/45810356/runtimeerror-cudnn-status-internal-error)
None has worked so far.
Have you seen this type of error before? Or did I set my computer correctly?
I observed you have a params.py
to specify all parameters. Some has mentioned this error can result from lack of memory (link), and it seems your code did something for batch processing.
Info of my setting:
- Ubuntu 16.04
- CUDA 8.0 / CuDNN 5.1
- GPU: Nvidia 1080 Ti (11GB memory) --- I also tried on another machine with Titan X, not working
- theano 0.9
My ~/.theanorc
:
[global]
floatX=float32
device=cuda0
optimizer=None
allow_gc=True
#gpuarray.preallocate=0.95
gcc.cxxflags=-Wno-narrowing
exception_verbosity=high
[lib]
cnmem=0.75
[nvcc]
nvcc.fastmath=True
[cuda]
root=/usr/local/cuda-8.0
If you have any suggestions, please let me know! Thanks for your help and support!
Update:
After I tried to remove other versions of CuDNN: (https://groups.google.com/forum/#!topic/theano-users/w4M3Xy0ec60), the error changes to the following.
Traceback (most recent call last):
File "./main.py", line 27, in <module>
save_npz_file_path = main_reconstruct.reconstruction(datasetFolder, _model, imgNamePattern, poseNamePattern, outputFolder, N_viewPairs4inference, resol, BB, viewList)
File "/home/ICT2000/tli/Workspace/SurfaceNet/main_reconstruct.py", line 77, in reconstruction
cubeCenter_hw = np.stack([img_h_cubesCenter, img_w_cubesCenter], axis=0)) # (N_cubes, N_views, D_embedding), (N_cubes, N_views)
File "./utils/earlyRejection.py", line 48, in patch2embedding
_patches_embedding_inScope[_batch] = patch2embedding_fn(_patches_preprocessed[_batch]) # (N_batch, 3/1, patchSize, patchSize) --> (N_batch, D_embedding). similarityNet: patch --> embedding
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/compile/function_module.py", line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ICT2000/tli/.conda/envs/SurfaceNet/lib/python2.7/site-packages/theano/compile/function_module.py", line 884, in __call__
self.fn() if output_subset is None else\
File "pygpu/gpuarray.pyx", line 676, in pygpu.gpuarray.pygpu_empty
File "pygpu/gpuarray.pyx", line 290, in pygpu.gpuarray.array_empty
pygpu.gpuarray.GpuArrayException: cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Apply node that caused the error: GpuDnnConv{algo='small', inplace=False}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='cross', precision='float32'}.0, Cast{float32}.0, Cast{float32}.0)
Toposort index: 276
Inputs types: [GpuArrayType<None>(float32, (False, False, False, False)), GpuArrayType<None>(float32, (False, False, False, False)), GpuArrayType<None>(float32, (False, False, False, False)), <theano.gof.type.CDataType object at 0x7f703a019c90>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(1100, 3, 64, 64), (64, 3, 3, 3), (1100, 64, 64, 64), 'No shapes', (), ()]
Inputs strides: [(49152, 16384, 256, 4), (108, 36, 12, 4), (1048576, 16384, 256, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7f6e133930c0>, 1.0, 0.0]
Inputs type_num: [11, 11, 11, '', 11, 11]
Outputs clients: [[HostFromGpu(gpuarray)(GpuDnnConv{algo='small', inplace=False}.0)]]
@Rubikplayer
For the updated error_log, it mentions: pygpu.gpuarray.GpuArrayException: cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory
. Can you change the cnmem=0.75
--> cnmem=0.95
in .theanorc OR change __GPUMemoryGB = 11
to a safe value, say __GPUMemoryGB = 6
in params.py
and let's see what it print out.
Also, for the theano installation please refer to #3 (comment)
@mjiUST
The code seems to be running, after I set gpuarray.preallocate=0.8
(also commented #cnmem=0.75
). (This was before I saw your feedback. I will try your suggested values a bit later).
May I confirm with you on two questions:
- Theano/Lasagne is quite new to me. I wasn't quite sure the difference between
gpuarray.preallocate
andcnmem
.
According to the theano doc link, seems gpuarray.preallocate
was designed for new gpu back, and cnmem
for the old one. Since we are using version 0.9, I suppose I should set cnmem
instead of gpuarray.preallocate
? If so, then what I just set was just not setting any limit.
- With my setting above, it seems to run on the example dinosaur data. About 2 hours passed, it finished 68% in surfacenet inference. Is this typical, or there's any way to make it faster?
My setting change: __GPUMemoryGB = 11
and __cube_D = 32
.
Also, my GPU (1080 Ti) should be slower than Titan X.
Thanks for the help!!
@Rubikplayer
Thanks for your feedback. It's great to know the code is running.
-
For the theano memory preallocation, the link you mentioned says that after you set the Theano flag
allow_gc
toFalse
(Theano will not collect GPU memory garbage.),CNMeM
will not affect GPU speed anymore. In my opinion,CNMeM
andgpuarray.preallocate
are the same thing for older and newer versions. Just use any one which let the GPU memory preallocated in the very beginning (you can use commandwatch nvidia-smi
to check, i.e., the majority memory was reserved.) -
For the speed of SurfaceNet: the setting
__cube_D = 64
could result in a little bit faster process. Before that you can check whether your .theanorc includeoptimizer=fast_run
for fast running mode as mentioned inLine 40 in 149f6e0
If everything goes well, the dinosaur dataset should finish in one hour.
@mjiUST
Thanks for the suggestion! I tried optimizer=fast_run
indeed accelerates the process. but for __cube_D = 64
, I still got some out of memory issue. I've sent an email to your school email for detail questions.
Thanks again!