
with 11G memory still "Check failed: error == cudaSuccess (2 vs. 0) out of memory"

litingfeng opened this issue · 7 comments

I have a K40 with >11G memory, but when I run demo_LocNet_object_detection_pipeline, it reminds me
Check failed: error == cudaSuccess (2 vs. 0) out of memory. I thought 11G is enough because in readme only required 6G. Why is that ?

Did you build Caffe with cuDNN library? I think without it, Caffe uses much more GPU memory when applying the convolutional layers. That is probably why you run out of GPU memory. Could you check it out?

@gidariss Yes, I did compile with cudnn. I noticed that after run the first network (rec), it used 6G memory, and when running the second network ,the error showed up. Do I need to free GPU memory after the first network ? How?

@litingfeng No, you do not need to free GPU memory after the first network. What you can do is in the demo_LocNet_object_detection_pipeline.m script to change the lines 90 and 91 from:
model_obj_rec_max_rois_num_in_gpu = 500;
model_obj_loc_max_rois_num_in_gpu = 400;
model_obj_rec_max_rois_num_in_gpu = 200;
model_obj_loc_max_rois_num_in_gpu = 200;

I just tried it and I manage to run the demo on a 6Gbyte GPU. Could you tried as well and let me know?


@gidariss I gitted a new one, but it still can't work. I even tried 50,100, all run out of memory. When I was running , I checked GPU usage with nvidia-smi . It turned out there was indeed 11408MB has been used. Thank you for your patience.

Later today, I tried script_test_object_detection_pipeline_PASCAL.m. It works without any modification. So strange.

It seems that in demo, you did not caffe.reset_all(); after each network.

@litingfeng Regarding the script_test_object_detection_pipeline_PASCAL.m, it uses a single model (either the recognition or the localization model) at a time and that is why you do not have any problem running it. However, the LocNet_object_detection() function, which is used in the demo, uses both models simultaneously. Do you mean that you placed caffe.reset_all() calls inside the LocNet_object_detection() function?

As I said, I did not have any problem running the demo on a 6Gbyte GPU. So it is strangle that in your case it cannot run in 11Gbyte GPU.