Maghoumi/pytorch-softdtw-cuda

Regarding GPU memory footprint

netw0rkf10w opened this issue · 1 comments

Hi @Maghoumi,

Thanks a lot for sharing your work!

I tried using your implementation for learning of model to match two sequences, and found that it consumes a lot of GPU memory, more than two times that of an optimal transport (0T) loss. With the exact same training configuration (batch size, etc.), OT could fit into a 16GB memory while DTW could not fit into a 32GB GPU. Do you think that this is expected?

Could you please tell me if it's possible to somehow reduce the memory footprint of DTW? I'm not sure if I should play with the bandwidth argument...

Thank you very much in advance for your reply!

Hello, and thanks for your interest in my work!

Yes, the large memory footprint is expected. The issue partly stems from the way the cost map is calculated, since we'd need to compute the pairwise matching cost between the two sequences (as done here). Also, my implementation relies on such input.

Unfortunately, playing with the bandwidth parameter doesn't help in this case, as it's only intended to save on the runtime in this particular implementation, not the memory footprint. That being said, with some modifications the bandwidth parameter could also be used to reduce the memory footprint as well. For instance, one could try and modify the algorithm in a way that the cost map is only calculated for the elements that fall inside the bandwidth region, and the kernel is executed using this "truncated" cost map.

At this point in time, I do not plan on exploring these, but I'd be more than happy to review pull requests that implement these features.