VRAM usage with amp

Question

VRAM usage with amp

Zarxrax opened this issue 7 months ago · 8 comments

I have been testing the amp setting, and I am a little confused by the result I am seeing. With cutie's default settings, I see less vram usage when amp: is disabled. Only when increasing the max_internal_size do I get any vram benefit from enabling it.
Each test run was conducted following a fresh restart of the application.

For a short clip with only 79 frames, it used 2.5gb with amp: True, and only 1.8gb with amp: False.

For a clip that is 1888 frames in length, I left all memory settings at defaults except I increased the long term memory size, so that I can measure the memory usage without it getting purged.
With amp: True, the entire clip completed, but it ended up right at 12gb, which is the limit for my gpu.
With amp: False, the entire clip processed and ended up using 11gb.

With the longer clip again, I increase the max_internal_size to 720. This time I did see a huge benefit for amp: True.
With amp: True it was able to process 160 frames before coming to a stop due to being out of vram.
With amp: False, it was only able to process about 65 frames.

Taking the max_internal_size back down a bit, to 540
With amp: True I was able to process 1055 frames
With amp: False I was able to process 755

So basically what I am seeing, is at max_internal_size of 480 or lower, amp is harmful to vram usage. Then the more you increase max_internal_size, the more benefit that is gained from it.
Can you confirm if this result makes sense? I am not sure if it could just be something peculiar to my own system, or if this result is expected.

Answer 1 · 2024-02-19T16:47:05.000Z

Where are you viewing the VRAM usage from?

Answer 2 · 2024-02-19T16:50:07.000Z

The gauge on the right panel, gpu mem, all proc, w/caching.

Answer 3 · 2024-02-19T16:54:33.000Z

Yeah that's not an accurate measure of how much memory the program "needs". PyTorch aggressively caches, or takes more memory than it needs. The "torch, w/o caching" one is more important.

Answer 4 · 2024-02-19T16:56:45.000Z

Alright thanks, I will review it some more.

Answer 5 · 2024-02-19T21:23:36.000Z

I guess I am having trouble understanding how the one w/o caching matters?

When the one w/caching fills up, the processing slows to a crawl. I believe cpu mode runs faster at that point. The one w/o caching displays such a ridiculously small number, I thought Cutie must really be using more vram than what it displays.
Can this cache be cleared using something like torch.cuda.empty_cache(), or will that also clear useful data out of the memory?

Answer 6 · 2024-02-20T06:52:20.000Z

Hmm I don't think I have seen that happen before. The program only has access to and is only using the GPU memory portion w/o caching. I cannot think of any reason for there to be a significant slowdown... In any case, it should either crash or continue running at the normal speed (swapping shouldn't be possible).

You can try torch.cuda.empty_cache() -- it is not going to purge any useful data. However I don't think it would help unless there is a PyTorch's bug.

Answer 7 · 2024-02-20T22:05:57.000Z

The only thing I can think of is that I am on Windows, so maybe it handles the cache differently than on Linux. I am using Pytorch 2.2.

I tried adding the torch.cuda.empty_cache() when the vram got full, and it seemed to work well. There was a short pause while it cleared the cache, then it continued processing the next frames.
With max_internal_size of 720, it initially had to do this every couple hundred frames, but after clearing the cache a few times, then the vram usage suddenly stopped increasing and it continued to process the remainder of the video without stopping again.
The gauge displaying gpu mem w/o caching never went above 2gb.

Back to my original question about AMP. I was able to have someone else who is also using Windows test it as well. They did not have the same findings that I did. They found that AMP consistently filled less of their total vram.
So I guess I will just leave AMP always turned on. With emptying the cache, I no longer having any concern with the vram usage.

Answer 8 · 2024-02-20T23:13:53.000Z

Glad to see that you have a working solution. Unfortunately, I am still not sure what is causing this problem.
Thank you for the detailed report and description -- future users with the same problem should find this issue of great help 😄