Check failed: error == cudaSuccess (8 vs. 0) invalid device function
twtygqyy opened this issue ยท 38 comments
There is no problem for me to run the demo.py of fast-rcnn, however, I had the error as follows when I try to run the demo.py of py-faster-rcnn after successfully make -j8 & make pycaffe
Loaded network /home/ubuntu/py-faster-rcnn/data/faster_rcnn_models/ZF_faster_rcnn_final.caffemodel
F1008 04:30:16.139123 5360 roi_pooling_layer.cu:91] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
Anyone has the same problem?
I'm running the code using K520 with 4G GPU memory. Is it because the code cannot support this GPU? CPU mode works fine.
You might find some solutions here.
@rbgirshick I got the same error, but I can run fast-rcnn on GPU using the same Makefile.config to compile caffe-fast-rcnn
I got the same error too, i have do many tests, also have try to edit Makefile.config, but there are still the same error invalid device function, but some time this error was in other .cu file not roi_pooling_layer.cu. So i think the version of caffe-fast-rcnn which faster-rcnn used has some compatibility problem? And if i want to use other version of caffe, eg. caffe in fast-rcnn, which files i should copy to fast-rnn's caffe-fast-rcnn? @rbgirshick
I got the same error too. I have carefully read the solutions pointed out by @rbgirshick . However, this error still exists. Finally, I have to get back to the matlab version again.
I have done many tests, and i found this type of error maybe called by some function of faster-rcnn which fast-rcnn doesn't have. When i use the imagenet model, there isn't this error, and all run well, after RPN training, generate proposals, the error invalid device function occurs, and very strange, call net.forward in function im_proposals() first time, it runs, when second time, error. When i comment the last layer :
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rois'
top: 'scores'
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
all run well, so this problem maybe called by some functions of faster-rcnn? @rbgirshick
I have fixed the problem and after some modifications, now it runs well
@PierreHao Could you share your modifications?
OK, for me , it works, but for your problem, you should test yourself. I found that, use cpu mode, it can run, so the problem is gpu, then in the code nms, by default, it calls nms_gpu version, so if we use caffe gpu mode and nms_gpu, there will be an error for our type of GPU (not surely, my guess). You can change nms_wrapper.py to set mode cpu, or in proposal_layer.py the function forward(), comment nms function and related code, nms_cpu mode is slow, comment nms is more fast. you can try it by yourself. Good luck!
@twtygqyy @PierreHao I've pushed a small change to demo.py that I hope will fix the underlying problem. Let me know if you have a chance to check the patch. Thanks.
@PierreHao Thank you for the information, I've tried to comment nms, but it could not help to pass the error.
@rbgirshick Thanks for the update. However, I still have the same error after modified the code.
@rbgirshick I think the problem is called by the GPU, some version of GPU couldn't call a gpu program in another gpu program, when i try titan, i works, when i try 2 different tesla, there will be the error: invalid device function(But pass the error if use cpu mode of nms)
@PierreHao For me, changing this line: __C.USE_GPU_NMS = Ture
to __C.USE_GPU_NMS = False
in py-faster-rcnn/lib/fast_rcnn/config.py
solves the problem, thanks for your information. It took about 0.975s for 300 object proposals. This is not faster compared to fast rcnn which takes 2.205s for 21007 object proposals. But if you don't do the nms, multiple windows for a single object will appear.
@sunshineatnoon 0.975s means that you use NMS cpu mode, so it runs slowly.
@PierreHao If you delete all codes about nms, will multiple bboxes appear in an image?
@sunshineatnoon when you delete nms in training process, maybe there will be an error. NMS is not necessary, without nms, nultiple bboxes appears ,and you can try it.
Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134
@rbgirshick any chance we can support multiple architectures in there?
@alantrrs Can you specify how to change the architecture? My GPU is Quadro K4000.
@sunshineatnoon I believe your GPU has a Kepler architecture, so you can change sm_35
to sm_30
.
@alantrrs I changed my setup.py file like this, but I still got the error:
Extension('nms.gpu_nms',
['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
library_dirs=[CUDA['lib64']],
libraries=['cudart'],
language='c++',
runtime_library_dirs=[CUDA['lib64']],
# this syntax is specific to this build system
# we're only going to use certain compiler args with nvcc and not with gcc
# the implementation of this trick is in customize_compiler() below
extra_compile_args={'gcc': ["-Wno-unused-function"],
'nvcc': ['-arch=sm_30',
'--ptxas-options=-v',
'-c',
'--compiler-options',
"'-fPIC'"]},
include_dirs = [numpy_include, CUDA['include']]
)
@PierreHao thanks Pierre for your solution!
in $FCN_ROOT/lib/fast_rcnn/config.py
set __C.USE_GPU_NMS = False
It worked in my case (using a GPU on AWS).
@twtygqyy what you have changed? Your gpu is old, i have tested gpu with computing power 5.0, all run well
@PierreHao I changed setting from sm_35 to sm_30. I'm using AWS g2.8xlarge instance.
@twtygqyy Hi, I got the same error too, if i set __C.USE_GPU_NMS = True in $FCN_ROOT/lib/fast_rcnn/config.py. I'm using AWS g2.0xlarge instance. So, how can i change the architecture to solve the problem? Thanks a lot.
@zimenglan-sysu-512 if you're using the GPU instance on AWS, then please change the architecture setting into:
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_50,code=compute_50
Because the GPU in AWS does not support compute_35
@twtygqyy I I changed settings from sm_35 to sm_30 and remove *_50, but it did not work. what other settings should be changed? Thanks.
@twtygqyy I have solve the problem. In the case, I use K520 of aws. Thanks for your help. As below, there is my solution (thress steps):
1 if you're using the GPU instance on AWS, then please change the architecture setting into:
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50
Because the GPU in AWS does not support compute_35
2 I changed sm_35 into sm_30 in lib/setup.py file
3 cd lib
, remove these files: utils/bbox.c nms/cpu_nms.c nms/gpu_nms.cpp, if they exist.
And then make && cd ../caffe/ && make clean && make -j8 && make pycaffe -j8
@zimenglan-sysu-512 Sorry for the late reply, I'm glad to hear that your problem has been solved.
Good Luck!
@alantrrs thanks for the pointer ! That fixed the problem.
@sunshineatnoon did you remove the *.so files and recompile the $FRCN_ROOT/lib ?
@rodrigob I tried to remove *.so in $FRCN_ROOT/lib/nms
and $FRCN_ROOT/lib/utils
, now it works, Thanks very much!
@sunshineatnoon , I use GeForce GTX 760, and come across the problem too, for the solution:
- I changed sm_35 into sm_30 in lib/setup.py file, and
- $FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
the problem solved, the diffrence of --cpu model and --gpu model is :
GPU:Detection took 0.158s for 100 object proposals
CPU:Detection took 1.505s for 100 object proposals
wonderful! thank you for your answer!
@xiaohujecky Note that if you set __C.USE_GPU_NMS = False
then changing sm_35
in lib/setup.py
should have no effect. The sm_35
is a CUDA compilation setting and affects only GPU code.
In any case, I still face this error. It is pretty simple to reproduce. Run Faster-RCNN training and alongside it run a simple CUDA program that tries to cudaMalloc
as much GPU memory as it can grab. Faster-RCNN training will crash with this error. Neither of the above solutions worked for me.
I'm running this error with
$ docker run -ti caffe:gpu caffe --version
libdc1394 error: Failed
caffe version 1.0.0-rc3
and
$ nvidia-smi
Tue Oct 25 15:08:35 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28 Driver Version: 370.28 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 On | N/A |
| 0% 48C P8 7W / 200W | 62MiB / 8105MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 0% 38C P8 7W / 200W | 1MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1241 G /usr/lib/xorg/Xorg 60MiB |
+-----------------------------------------------------------------------------+
and
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
changed sm_35 into sm_30 in lib/setup.py file, and
$FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
Work well in my case. Thank you @xiaohujecky
Still getting this on a GTX 1060. Tried __C.USE_GPU_NMS = False
.
Update: Adding -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52;60 -DCUDA_ARCH_PTX=60
to the CMake options resolved it.
__C.USE_GPU_NMS = False
made no difference.
Not sure why this issue is closed when it still seems to be a constant problem.
Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134@rbgirshick any chance we can support multiple architectures in there?
That worked for me.
Hi ...
Does anyone can advise where is this setup.py file that I need to change in windows 10 env.
I have this same error .. when trying to run OpenposeVideo.bat.
I understand my Nvidia card should be using sm 86, but am fine to remove the GPU if this is not really working in this openpose script.