GPU memory leak
m-pilia opened this issue · 12 comments
I ran two batches of registrations overnight (scripted with the Python API) on two different machines, and both crashed after about 30 registrations due to OOM on the device. I just launched one batch again, and it's leaking around 290 MiB of device memory for each registration.
Oh, that's bad. I'll take a look. Probably some weird thing I didn't consider with constructors/destructors on CUDA.
I have been giving a look at this this morning. 290 MiB ≈ 302 MB is the total size of fixed and moving pyramids when registering two POEM volumes using two channels. A run with cuda-memcheck
confirms that the leak is coming from GpuRegistrationEngine
, mostly from set_image_pair
. However, it seems the mask pyramid is not leaking, so it is probably something wrong with the vector of pyramids. I am trying to dig into this...
Looking into it it seems like there's some GpuVolume
not being released somewhere. There are two dangling GpuVolumeData
s that causes the leak.
That, plus the two thrust::device_vector
instances in the GPU landmark cost function.
A fix is to manually call the destructor of the image pyramids within the destructor of GpuRegistrationEngine
, but I would like to understand why it is not called automatically.
I was thinking that maybe the shared_pointer
of the _volume_data
gets captured somewhere by accident.
The destructor of GpuVolumePyramid
seems to be invoked, so calling it again is probably just cleaning up somebody else's mess.
I have found the problem. I think we were pretty much misusing unique_ptr
in the GpuUnaryFunction
, replacing it with a shared_ptr
seems to solve all the issues.
Hm, but how is it misused? It should be unique, right? Otherwise we would have the same problem on CPU.
Ohh, GpuSubFunction
have no destructor!
Oh, that makes sense!
Weird, shouldn't we have had the same problem with a shared_ptr
too?
As a side note, we should probably have a pure virtual destructor for this kind of base classes.