The function torch.cuda.empty_cache() is missing.

Question

The function torch.cuda.empty_cache() is missing.

lintao185 opened this issue 7 months ago · 6 comments

Answer 1 · 2024-06-09T12:51:07.000Z

Even though torch.NewDisposeScope() has been used, the GPU memory usage remains high.
I've been debugging for a whole day with no progress. This memory management issue is very tricky, and I'm currently at a loss.

Answer 2 · 2024-06-09T16:16:01.000Z

might be similar to #1194

Answer 3 · 2024-06-09T16:17:53.000Z

and for the memory issue... We don't even know what happens in your loop... It's really hard to help you if you don't provide a way to reproduce the problems.

Answer 4 · 2024-06-09T16:27:42.000Z

The strange thing is that I added torch.NewDisposeScope() to every function, but it still leads to memory leaks. However, when I write a standalone demo, there are no issues. It's very puzzling, and I can't figure it out.

Answer 5 · 2024-06-10T02:48:23.000Z

It seems I found the reason: TorchSharp doesn't have torch.cuda.amp, so more memory is needed to store the computation graph during training.
I'm giving up for now. I'll come back to look at my current project after some time; maybe there will be a surprise.😊😊😊

Answer 6 · 2024-06-10T14:48:29.000Z

Torch doesn't seem to release GPU memory once it's been allocated, unless you call empty_cache(). It holds onto it for future needs.

This is a duplicate of #892, where I discussed some of the complications with implementing it, so I'll close this issue and keep the old one open.