VSG calls vkAllocateMemory() too much
timoore opened this issue · 5 comments
Vulkan implementations have a limit on the total number of times that vkAllocateMemory() can be called. On AMD drivers this limit is quite low at 4096 total allocations. I locally modified DeviceMemory.cpp to log a message every time vkAllocateMemory() is called. A little while into a run of vsgCs, we see:
info: device memory allocation 774 16777216 total: 2472903696
info: device memory allocation 775 74736 total: 2472978432
info: device memory allocation 776 74736 total: 2472978576
info: device memory allocation 777 74880 total: 2472979008
info: device memory allocation 778 75024 total: 2472979440
info: device memory allocation 779 75024 total: 2472979728
info: device memory allocation 780 75024 total: 2472980016
info: device memory allocation 781 75168 total: 2472980304
info: device memory allocation 782 75168 total: 2472980448
info: device memory allocation 783 75168 total: 2472980592
info: device memory allocation 784 75312 total: 2472980880
info: device memory allocation 785 75312 total: 2472981024
info: device memory allocation 786 75312 total: 2472981168
info: device memory allocation 787 75600 total: 2472981600
info: device memory allocation 788 75600 total: 2472981888
info: device memory allocation 789 75600 total: 2472982176
info: device memory allocation 790 75888 total: 2472982752
info: device memory allocation 791 75888 total: 2472983040
info: device memory allocation 792 75888 total: 2472983328
info: device memory allocation 793 76032 total: 2472983760
info: device memory allocation 794 76032 total: 2472983904
info: device memory allocation 795 76608 total: 2472984624
info: device memory allocation 796 76608 total: 2472985344
info: device memory allocation 797 76608 total: 2472985920
info: device memory allocation 798 76896 total: 2472986784
info: device memory allocation 799 76896 total: 2472987072
The one large allocation comes from paging in terrain and is made on behalf of a memory pool. The many small allocations are coming from the transfer task, which doesn't use a pool and allocates memory large enough to copy all the dynamic buffers in use. I don't know why this number is slowly growing. It's very pessemistic with regards to the dynamic data actually transferred; unless LOD fading is enabled, the only dynamic data transferred in a vsgCs frame is the light data.
So, there are two bugs:
- TransferTask should allocate its memory from a memory pool.
- Even when pools are used, freed memory is not returned to the pool, but to Vulkan. But that's no good, because a call will eventually have to be made to the driver to allocate more memory when the pool runs out, thus eating into the total number of allocations.
In the not-too-long term, almost all the direct calls to DeviceMemory::create() should go away and be replaced by allocations from pools.
In the long term, TransferTask doesn't need a big staging buffer. Highly dynamic buffer data should be stored in memory that is host-visible and thus doesn't need a staging buffer at all.
The vsg::DeviceMemory and vsg::MemberBufferPools are both written to specifically batch Vulkan memory allocations and memory reuse. This part of the VSG should be capable of handling allocation and reuse just fine.
Could you check what type of objects are being allocated the most in your usage case to see if there are specific types of objects are triggering the large number of allocations.
Could you also look at whether you are using a modified version of vsgXchange that disables the sharing of state as this could lead to excessive number of DescriptorImage/DescirptorBuffer being created.
W.r.t vsg::TransferTask, it's role is multi-facted. The use of a staging buffer is required for transferring data to the GPU, and also for multi-buffering the data to prevent the a frames dynamic updates on the CPU from inappropriately affecting a concurrently rendered frame. The later issue is what tripped up the shadow mapping on NVidia cards as the buffer wasn't correctly being synchronized. By doing away with the buffering and synchronization you open the door to lots of obscure bugs like the shadow mapping one.
The vsg::DeviceMemory and vsg::MemberBufferPools are both written to specifically batch Vulkan memory allocations and memory reuse. This part of the VSG should be capable of handling allocation and reuse just fine.
OK, but I don't see how memory is returned to a memory pool. It's possible that I am wrong about maxMemoryAllocationCount and that the total number of allocations is decremented by vkFreeMemory. #942 covered a lot of the same ground, and in investigating that I thought that I had established that vkFreeMemory didn't help.
Could you check what type of objects are being allocated the most in your usage case to see if there are specific types of objects are triggering the large number of allocations.
The direct cause of the small allocations is the reallocation of staging buffers by the TransferTask. vsgCs allocates a very small amount of dynamic data, and this should be deallocated when terrain tiles are paged out. I haven't investigated if "unloading" tiles is working as expected, but I generally don't see unbounded memory use when running vsgCs.
The big allocations seem normal; terrain tiles do require texture and geometry data.
Could you also look at whether you are using a modified version of vsgXchange that disables the sharing of state as this could lead to excessive number of DescriptorImage/DescirptorBuffer being created.
The only modifcation to vsgXchange that I'm using is the sRGB textures PR. There are a few models in this scene that are loaded by vsgXchange, but vsgCs doesn't use vsgXchange to build terrain tiles.
W.r.t vsg::TransferTask, it's role is multi-facted. The use of a staging buffer is required for transferring data to the GPU, and also for multi-buffering the data to prevent the a frames dynamic updates on the CPU from inappropriately affecting a concurrently rendered frame. The later issue is what tripped up the shadow mapping on NVidia cards as the buffer wasn't correctly being synchronized. By doing away with the buffering and synchronization you open the door to lots of obscure bugs like the shadow mapping one.
The TransferTask can't do much if the destination buffers aren't multi-buffered, other than protect the copy operations with a semaphore, and I understand that using staging buffers does simpify that. The alternative would be to use a fence to delay a direct write from the CPU until a safe time.
The vsg::BufferInfo has a mechanism in it for returning memory blocks. It's a while since I wrote this stuff though ;-)
The vsg::TransferTask shouldn't be thrashing the staging buffer, instead it should only be reallocated when more memory is required, and should be kept around at the max size. If it's thrashing then this is a bug.
I looked at the source of the Vulkan Memory Allocator, and they count vkFreeMemory calls against the total allocations. They ought to know. I changed my debugging output to reflect that, and I know see:
info: VK memory allocations: 93 140688 total: 2798694160
info: VK memory allocations: 93 141264 total: 2798694880
info: VK memory allocations: 93 141264 total: 2798695600
info: VK memory allocations: 93 141264 total: 2798696320
info: VK memory allocations: 93 141408 total: 2798697040
info: VK memory allocations: 93 141408 total: 2798697184
info: VK memory allocations: 93 141408 total: 2798697328
info: VK memory allocations: 93 16777216 total: 2815333280
info: VK memory allocations: 94 141552 total: 2815474832
info: VK memory allocations: 94 141552 total: 2815474976
info: VK memory allocations: 94 141552 total: 2815475120
info: VK memory allocations: 94 141840 total: 2815475552
info: VK memory allocations: 94 142128 total: 2815476128
info: VK memory allocations: 94 142128 total: 2815476704
info: VK memory allocations: 94 142272 total: 2815477424
info: VK memory allocations: 94 142272 total: 2815477856
info: VK memory allocations: 94 142272 total: 2815478000
info: VK memory allocations: 94 142416 total: 2815478288
info: VK memory allocations: 94 142416 total: 2815478432
info: VK memory allocations: 94 143280 total: 2815479440
info: VK memory allocations: 94 143280 total: 2815480448
info: VK memory allocations: 94 143280 total: 2815481312
info: VK memory allocations: 94 143712 total: 2815482608
info: VK memory allocations: 94 143856 total: 2815483184
info: VK memory allocations: 94 143568 total: 2815483040
info: VK memory allocations: 94 143568 total: 2815483328
info: VK memory allocations: 94 144144 total: 2815484192
info: VK memory allocations: 94 144000 total: 2815484336
info: VK memory allocations: 94 144000 total: 2815484768
That's a much more reasonable number for the total number of allocations. The total amount of allocated memory does seem kind of high, but that's something I need to look at in RenderDoc for vsgCs specifically. If there's any issue left, it's the behavior of TransferTask, but that's not critical.
Incidently, you might want to look at using the Vulkan Memory Allocator to replace the memory pools and DeviceMemory direct allocations. I know that it's yet another third party dependency , but it has been hammered on by many developpers for several years.
FYI, the VSG's CPU memory allocation and GPU memory allocation share quite a bit of code, just replacing the DeviceMemory/MemoryPools with Vulkan Memory Allocator wouldn't save much code and add another external dependency that doesn't gain functionality we don't already have.
That's not to say we can can't learn stuff from Vulkan Memory Allocator project, just that learning from it, will be better than just ripping out a bit of the VSG that mostly works already.