Inference RuntimeError: CUDA out of memory
rui2016 opened this issue · 3 comments
Hi,
when running preparation.py for 3DMatch I got the following error
RuntimeError: CUDA out of memory. Tried to allocate 5.15 GiB (GPU 0; 10.76 GiB total capacity; 6.27 GiB already allocated; 3.52 GiB free; 6.29 GiB reserved in total by PyTorch)
Is this a normal behavior? Since this is a provided demo, I would assume it should run without such issue on a GPU with 11GB memory.
Also, could you give a rough number on the runtime for inference, e.g., how long it needs to process 4096 keypoints?
Many thanks!
Hi, @rui2016, thanks for your interest in our work!
You can try to reduce the step_size
in preparation.py
appropriately according to the GPU memory usage.
SpinNet/ThreeDMatch/Test/preparation.py
Line 97 in 5581e7d
For the current version, it may take more than ten seconds to process 5000 keypoints at a time, and the whole inference may take three or four hours.
Best,
Sheng
Hi @aosheng1996,
thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.
Looking forward to an improved version if that is in your plan.
Cheers!
Hi @aosheng1996,
thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.
Looking forward to an improved version if that is in your plan.
Cheers!
As far as I can see, SpinNet requires much more runtime to process and extract descriptors, take 3DMatch benchmark as example, extracting 5k + 5k = 10k descriptors for 3DMatch comsumes about ~75s, while using classical FPFH descriptor, it takes almost <1s.