Inference RuntimeError: CUDA out of memory

Hi,

when running preparation.py for 3DMatch I got the following error

RuntimeError: CUDA out of memory. Tried to allocate 5.15 GiB (GPU 0; 10.76 GiB total capacity; 6.27 GiB already allocated; 3.52 GiB free; 6.29 GiB reserved in total by PyTorch)

Is this a normal behavior? Since this is a provided demo, I would assume it should run without such issue on a GPU with 11GB memory.

Also, could you give a rough number on the runtime for inference, e.g., how long it needs to process 4096 keypoints?

Many thanks!

Hi, @rui2016, thanks for your interest in our work!

You can try to reduce the step_size in preparation.py appropriately according to the GPU memory usage.

SpinNet/ThreeDMatch/Test/preparation.py

Line 97 in 5581e7d

step_size = 100

For the current version, it may take more than ten seconds to process 5000 keypoints at a time, and the whole inference may take three or four hours.

Best,
Sheng

Hi @aosheng1996,

thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.

Looking forward to an improved version if that is in your plan.

Cheers!

Hi @aosheng1996,

thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.

Looking forward to an improved version if that is in your plan.

Cheers!

As far as I can see, SpinNet requires much more runtime to process and extract descriptors, take 3DMatch benchmark as example, extracting 5k + 5k = 10k descriptors for 3DMatch comsumes about ~75s, while using classical FPFH descriptor, it takes almost <1s.