Vulkan renderer crashes after only a couple thousand draw calls.
magester1 opened this issue · 12 comments
Describe the bug
Vulkan renderer crashes after only a couple thousand of draw calls if you use anything but the most simple vertex shader. It seems to be that the vk renderer is running out of scratch pad memory.
In case it matters, I'm building on Windows 10 (SDK 10.0.20348.0) with clang 16.0.0. I'm not on the latest bgfx master, but I have looked through the commits and there doesn't seem to be any significant change since then (this is where I'm at).
Below are the steps to reproduce with one of the examples, but I'll describe some more first.
The application crashes with:
Exception thrown at 0x00007FFB2EBD14F7 (vcruntime140d.dll) in example-17-drawstress.exe: 0xC0000005: Access violation writing location 0x00000194D4000000.
and this is the callback stack:
example-17-drawstress.exe!bx::memCopy(void * _dst, const void * _src, unsigned __int64 _numBytes)
example-17-drawstress.exe!bgfx::vk::ScratchBufferVK::write(const void * _data, unsigned int _size)
example-17-drawstress.exe!bgfx::vk::RendererContextVK::submit(bgfx::Frame * _render, bgfx::ClearQuad & _clearQuad, bgfx::TextVideoMemBlitter & _textVideoMemBlitter)
example-17-drawstress.exe!bgfx::Context::renderFrame(int _msecs)
example-17-drawstress.exe!bgfx::renderFrame(int _msecs)
example-17-drawstress.exe!entry::Context::run(int _argc, const char * const * _argv)
example-17-drawstress.exe!main(int _argc, const char * const * _argv)
example-17-drawstress.exe!invoke_main()
example-17-drawstress.exe!__scrt_common_main_seh()
example-17-drawstress.exe!__scrt_common_main()
example-17-drawstress.exe!mainCRTStartup(void * __formal)
Basically, the problem is in this line. The copy is being called with the values: bx::memCopy(&m_data[8386752], ..., 2112)
which exceeds the maximum value: m_size == 8388480
.
I can see 2 problems here:
- The assert should probably be checking
m_pos < (m_size - _size)
or something similar. - I'm afraid I know almost nothing about Vulkan, but it seems like the scratchpad memory is not sized correctly. I can see here that the size is determined by max draw calls times 128 (where does 128 come from? should this be configurable?). The basic draw stress example never exceeds this because it's only using 64 bytes per draw call (checked via debugger) but when I use a different shader this goes up by a lot.
To Reproduce
Steps to reproduce the behavior:
- Modify example 17 (draw stress) to use a different shader like so:
diff --git a/examples/17-drawstress/drawstress.cpp b/examples/17-drawstress/drawstress.cpp
index aa9c6ecf6..49327ab3b 100644
--- a/examples/17-drawstress/drawstress.cpp
+++ b/examples/17-drawstress/drawstress.cpp
@@ -156,11 +156,12 @@ public:
bgfx::RendererType::Enum type = bgfx::getRendererType();
// Create program from shaders.
- m_program = bgfx::createProgram(
- bgfx::createEmbeddedShader(s_embeddedShaders, type, "vs_drawstress")
- , bgfx::createEmbeddedShader(s_embeddedShaders, type, "fs_drawstress")
- , true /* destroy shaders when program is destroyed */
- );
+ //m_program = bgfx::createProgram(
+ // bgfx::createEmbeddedShader(s_embeddedShaders, type, "vs_drawstress")
+ // , bgfx::createEmbeddedShader(s_embeddedShaders, type, "fs_drawstress")
+ // , true /* destroy shaders when program is destroyed */
+ // );
+ m_program = loadProgram("vs_metaballs", "fs_metaballs");
// Create static vertex buffer.
m_vbh = bgfx::createVertexBuffer(
- Run with vulkan renderer
./example-17-drawstress.exe --vk
- Increment number of entities/draw calls until it crashes.
Expected behavior
I assume the application shouldn't crash. Unless there's some limitations with Vulkan? In which case, I think those should be exposed somewhere in the capabilities so we can know what the limit is. But it doesn't look very promising, with a simple shader like vs_metaballs I can only send about 4000 draw calls before reaching the limit, it's about (64<<10) * 128 / 2112 ~ 3971
.
Note that it works perfectly well with other renderers like d3d or gl.
Screenshots
N/A
Additional context
I think the issue looks pretty straightforward, but please let me know if other information about my system would be useful. I could provide the logs if you think they would help.
Actually the same issue on MacOS with the Metal backend enable, after 4k draw calls it will crash so definitely some limitation hit. However this is ONLY in debug mode, release mode there is not a problem:
_platform_memmove 0x00000001978966e8
bx::memCopy(void *, const void *, unsigned long) bx.cpp:52
bgfx::mtl::RendererContextMtl::setShaderUniform(unsigned char, unsigned int, const void *, unsigned int) renderer_mtl.mm:1547
bgfx::mtl::RendererContextMtl::setShaderUniform4x4f(unsigned char, unsigned int, const void *, unsigned int) renderer_mtl.mm:1557
bgfx::ViewState::setPredefined<…>(bgfx::mtl::RendererContextMtl *, unsigned short, const bgfx::mtl::PipelineStateMtl &, const bgfx::Frame *, const bgfx::RenderDraw &) renderer.h:194
bgfx::mtl::RendererContextMtl::submit(bgfx::Frame *, bgfx::ClearQuad &, bgfx::TextVideoMemBlitter &) renderer_mtl.mm:4728
bgfx::Context::renderFrame(int) bgfx.cpp:2470
bgfx::renderFrame(int) bgfx.cpp:1491
bgfx::Context::renderThread(bx::Thread *, void *) bgfx_p.h:3150
bx::Thread::entry() thread.cpp:328
bx::ThreadInternal::threadFunc(void *) thread.cpp:95
_pthread_start 0x0000000197867fa8
------------ BGFX Stats ------------
CPU Frame Time: 9193
CPU Begin Time: 1692529311994101
CPU End Time: 1692529312003243
CPU Timer Frequency: 1000000
GPU Begin Time: 1692529311976854
GPU End Time: 1692529311977518
GPU Timer Frequency: 1000000
Wait Render: 2096
Wait Submit: 22
Draw Calls: 4720
Compute Calls: 0
Blit Calls: 0
Max GPU Latency: 0
GPU Frame Number: 0
Texture Memory Used: 53248
Render Target Memory Used: 0
Transient VB Used: 0
GPU Memory Max: -9223372036854775807
GPU Memory Used: -9223372036854775807
Width: 450
Height: 800
Text Width: 100
Text Height: 28
Number of view stats: 0
Number of encoders used during frame: 1
Primitives Rendered [0]: 9440
Primitives Rendered [1]: 0
Primitives Rendered [2]: 0
Primitives Rendered [3]: 0
Primitives Rendered [4]: 0
------------ End of BGFX Stats ------------
Make debug build and see debug output.
I already have a debug build, that's how I was able to do the analysis in the issue description.
Do you mean to share the log for the debug build? If so then here it is: log.txt
I'm actually getting a slight different behaviour now, it crashes pretty much immediately. Not sure why, I haven't really used Vulkan ever since I created the ticket originally. But the out of bounds access error, stacktrace and everything is the same, so it's still the same issue.
Update your drivers.
I've isolated my problem and fixed it:
Change [src/renderer_mtl.mm:1556]:(
Line 1556 in 954c18b
void setShaderUniform(uint8_t _flags, uint32_t _loc, const void* _val, uint32_t _numRegs)
{
uint32_t offset = 0 != (_flags&kUniformFragmentBit)
? m_uniformBufferFragmentOffset
: m_uniformBufferVertexOffset
;
uint8_t* dst = (uint8_t*)m_uniformBuffer.contents();
bx::memCopy(&dst[offset + _loc], _val, _numRegs*16);
}
To check for the UNIFORM_BUFFER_SIZE
before copying the memory.
void setShaderUniform(uint8_t _flags, uint32_t _loc, const void* _val, uint32_t _numRegs)
{
uint32_t offset = 0 != (_flags&kUniformFragmentBit)
? m_uniformBufferFragmentOffset
: m_uniformBufferVertexOffset
;
uint8_t* dst = (uint8_t*)m_uniformBuffer.contents();
if (offset + _loc > UNIFORM_BUFFER_SIZE) {
return;
}
bx::memCopy(&dst[offset + _loc], _val, _numRegs*16);
}
I can also just increase the buffer instead from src/renderer_mtl.mm:19
#define UNIFORM_BUFFER_SIZE (8*1024*1024)
To:
#define UNIFORM_BUFFER_SIZE (24*1024*1024)
@joseph-montanez I'm not entirely sure we are seeing the same issue. I'm not testing this with my code, this is happening with example 17.
The source line where the crash happens is in the issue description, where it's trying to write onto the vk scratch memory more than is available, these values are regardless of release/debug as well. Plus there's the incorrect assert in ScratchBufferVK::write
that only checks the start of the address and not the address + length of copy, although this would still result in a crash via the assert anyway so it doesn't really matter.
@bkaradzic If you mean my nvidia drivers then they are up to date. Is there any other Vulkan specific driver that I should have and I'm not aware of?
@magester1 That limit 3971
is EXACTLY the number of quads I could draw on screen, if I went to 3972 nothing else would render and going beyond 4000+ would crash it. Which means somewhere there is a limit causing that. We are both hitting the same exact limit before crashing. Since I am using Metal, my fix will do nothing to help you but should help narrow the problem area around data thats trying to be passed to the shader. For me it was the unified memory. The VK implementation doesn't have this and there are several places that could tell you exactly whats wrong but you need to debug the application to get the stack trace with lines. The stack trace you originally provided doesn't have line number so you most likely do not have BGFX compiled/linked with the debug version to get the lines associated information to further track down the issue.
But that's what I mean, this is happening because of vk's scratch memory, which I believe has nothing to do with Metal (please correct me if that's wrong). The number being the same seems like a happy coincidence to me, or maybe because bgfx is using this magic "128" for both of them?
Oh I feel like an idiot, I forgot to add the lines numbers to the stack trace!! Thank you for pointing that out. Just to clarify, I do have this running in debug mode, and I know exactly which lines are causing the issue (linked in the original description). But I don't know enough about Vulkan to understand the design decision behind the size of the scratch memory, that's why I created this ticket here.
Here's the trace with the line numbers, sorry about that I didn't realize they were missing:
example-17-drawstress.exe!bx::memCopy(void * _dst, const void * _src, unsigned __int64 _numBytes) (...\bgfx\bx\src\bx.cpp:44)
example-17-drawstress.exe!bgfx::vk::ScratchBufferVK::write(const void * _data, unsigned int _size) (...\bgfx\bgfx\src\renderer_vk.cpp:4644)
example-17-drawstress.exe!bgfx::vk::RendererContextVK::submit(bgfx::Frame * _render, bgfx::ClearQuad & _clearQuad, bgfx::TextVideoMemBlitter & _textVideoMemBlitter) (...\bgfx\bgfx\src\renderer_vk.cpp:8680)
example-17-drawstress.exe!bgfx::Context::renderFrame(int _msecs) (...\bgfx\bgfx\src\bgfx.cpp:2455)
example-17-drawstress.exe!bgfx::renderFrame(int _msecs) (...\bgfx\bgfx\src\bgfx.cpp:1489)
example-17-drawstress.exe!entry::Context::run(int _argc, const char * const * _argv) (...\bgfx\bgfx\examples\common\entry\entry_windows.cpp:521)
example-17-drawstress.exe!main(int _argc, const char * const * _argv) (...\bgfx\bgfx\examples\common\entry\entry_windows.cpp:1185)
example-17-drawstress.exe!invoke_main()
example-17-drawstress.exe!__scrt_common_main_seh()
example-17-drawstress.exe!__scrt_common_main()
example-17-drawstress.exe!mainCRTStartup(void * __formal)
So here is the issue:
uint8_t m_fsScratch[64<<10];
uint8_t m_vsScratch[64<<10];
Take anything that increments in 16 and you get 3971 limit. BTW its also used for...
void setShaderUniform(uint8_t _flags, uint32_t _regIndex, const void* _val, uint32_t _numRegs)
{
if (_flags & kUniformFragmentBit)
{
bx::memCopy(&m_fsScratch[_regIndex], _val, _numRegs*16);
}
else
{
bx::memCopy(&m_vsScratch[_regIndex], _val, _numRegs*16);
}
}
Why the limit... no idea. In my case macOS running on Arm64 doesn't have vram since its all shared memory. I am not sure why this needs to be limited to 64KB for Vulkan.
In my case the main culprit was the m_scratchBuffer
scratch buffer which is created here.
Although what you highlighted looks like an issue as well, and a bit odd that it's not using the BGFX_CONFIG_MAX_DRAW_CALLS
macro instead of being hardcoded. I'm not sure what the relationship between the m_scratchBuffer
and m_vs/fsScratch
buffers is.
But yeah, like you I don't know why this limits exists or how it was determined. Specially considering that what goes here depends on the shader size (is it size in number of uniforms?), since with the original example shader it works fine up to the max draw calls.
64k / 16 is 4096. If you're running out of fs/vsScratch that means you're setting over 4k uniforms.
I don't think example-17 is setting any uniforms besides the default ones (you know view transformations, etc), so I don't think that's the issue.